www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: why allocators are not discussed here

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Jun 26, 2013 at 01:16:31AM +0200, Adam D. Ruppe wrote:
 On Tuesday, 25 June 2013 at 22:50:55 UTC, H. S. Teoh wrote:
And maybe (b) can be implemented by making gc_alloc / gc_free
overridable function pointers? Then we can override their values
and use scope guards to revert them back to the values they were
before.

Yea, I was thinking this might be a way to go. You'd have a global (well, thread-local) allocator instance that can be set and reset through stack calls. You'd want it to be RAII or delegate based, so the scope is clear. with_allocator(my_alloc, { do whatever here }); or { ChangeAllocator!my_alloc dummy; do whatever here } // dummy's destructor ends the allocator scope I think the former is a bit nicer, since the dummy variable is a bit silly. We'd hope that delegate can be inlined.

Actually, D's frontend leaves something to be desired when it comes to inlining delegates. It *is* done sometimes, but not as often as one may like. For example, opApply generally doesn't inline its delegate, even when it's just a thin wrapper around a foreach loop. But yeah, I think the former has nicer syntax. Maybe we can help the compiler with inlining by making the delegate a compile-time parameter? But it forces a switch of parameter order, which is Not Nice (hurts readability 'cos the allocator argument comes after the block instead of before).
 But, the template still has a big advantage: you can change the
 type. And I think that is potentially enormously useful.

True. It can use different types for different allocators that does (or doesn't) do cleanups at the end of the scope, depending on what the allocator needs to do.
 Another question is how to tie into output ranges. Take std.conv.to.
 
 auto s = to!string(10); // currently, this hits the gc
 
 What if I want it to go on a stack buffer? One option would be to
 rewrite it to use an output range, and then call it like:
 
 char[20] buffer;
 auto s = to!string(10, buffer); // it returns the slice of the
 buffer it actually used
 
 (and we can do overloads so to!string(10, radix) still works, as
 well as to!string(10, radix, buffer). Hassle, I know...)

I think supporting the multi-argument version of to!string() is a good thing, but what to do with library code that calls to!string()? It'd be nice if we could somehow redirect those GC calls without having to comb through the entire Phobos codebase for stray calls to to!string(). [...]
 The fun part is the output range works for that, and could also work
 for something like this:
 
 struct malloced_string {
     char* ptr;
     size_t length;
     size_t capacity;
     void put(char c) {
         if(length >= capacity)
            ptr = realloc(ptr, capacity*2);
         ptr[length++] = c;
     }
 
     char[] slice() { return ptr[0 .. length]; }
     alias slice this;
     mixin RefCounted!this; // pretend this works
 }
 
 
 {
    malloced_string str;
    auto got = to!string(10, str);
 } // str is out of scope, so it gets free()'d. unsafe though: if you
 stored a copy of got somewhere, it is now a pointer to freed memory.
 I'd kinda like language support of some sort to help mitigate that
 though, like being a borrowed pointer that isn't allowed to be
 stored, but that's another discussion.

Nice!
 And that should work. So then what we might do is provide these
 little output range wrappers for various allocators, and use them on
 many functions.
 
 So we'd write:
 
 import std.allocators;
 import std.range;
 
 // mallocator is provided in std.allocators and offers the goods
 OutputRange!(char, mallocator) str;
 
 auto got = to!string(10, str);

I like this. However, it still doesn't address how to override the default allocator in, say, Phobos functions.
 What's nice here is the output range is useful for more than just
 allocators. You could also to!string(10, my_file) or a delegate,
 blah blah blah. So it isn't too much of a burden, it is something
 you might naturally use anyway.

Now *that* is a very nice idea. I like having a way of bypassing using a string buffer, and just writing the output directly to where it's intended to go. I think to() with an output range parameter definitely should be implemented. It doesn't address all of the issues, but it's a very big first step IMO.
Also, we may have the problem of the wrong allocator
being used to free the object.

Another reason why encoding the allocator into the type is so nice. For the minimal D I've been playing with, the idea I'm running with is all allocated memory has some kind of special type, and then naked pointers are always assumed to be borrowed, so you should never store or free them.

Interesting idea. So basically you can tell which allocator was used to allocate an object just by looking at its type? That's not a bad idea, actually.
 auto foo = HeapArray!char(capacity);
 
 void bar(char[] lol){}
 
 bar(foo); // allowed, foo has an alias this on slice

This is nice. Hooray for alias this. :)
 // but....
 
 struct A {
    char[] lol; // not allowed, because you don't know when lol is
 going to be freed
 }
 
 
 foo frees itself with refcounting.

This is a bit inconvenient. So your member variables will have to know what allocation type is being used. Not the end of the world, of course, but not as pretty as one would like. On Wed, Jun 26, 2013 at 03:24:57AM +0200, Adam D. Ruppe wrote:
 I was just quickly skimming some criticism of C++ allocators, since
 my thought here is similar to what they do. On one hand, maybe D can
 do it right by tweaking C++'s design rather than discarding it.
 
 On the other hand, with all the C++ I've done, I have never actually
 used STL allocators, which could say something about me or could say
 something about them.
 
 
 One thing I saw said making the differently allocated object a
 different type sucks. ...but must it? The complaint there was "so
 much for just doing a function that takes a std::string". But, the
 way I'd want to do it in D is the function would take a char[]
 instead, and our special allocated type provides that via opSlice
 and/or alias this.

Yeah I think alias this adds a whole new factor into the equation. The advantage of having a distinct type makes it much easier to implement, and allows you to mix differently-allocated objects without having to worry about things like calling the right version of gc_free to cleanup properly. You can even have the same underlying data type be allocated in two different ways, and the cleanup will happen correctly. Basically, when you allocate some object O of class C using allocator A, then it follows that no matter what you do with the gc_alloc/gc_free function pointers afterwards, O must be freed using A.free. So in a sense, O needs to carry around a function pointer to A.free in its dtor (or whoever frees it). So this actually argues for having a distinct type for an instance of C allocated using A, vs. an instance of C allocated using a different allocator B. You need to store that function pointer to A.free and B.free *somewhere*, otherwise things won't work properly. [...]
 Anyway, bottom line is I don't think that criticism necessarily
 applies to D.

Agreed, in D, distinct types per allocator is, at the very least, not as bad as it is in C++.
 But there's surely many others and I'm more or less a
 n00b re c++'s allocators so idk yet.

Who *isn't* a n00b wrt to C++'s allocators, since so few people actually use it? :-P T -- He who sacrifices functionality for ease of use, loses both and deserves neither. -- Slashdotter
Jun 26 2013
parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wednesday, 26 June 2013 at 16:40:20 UTC, H. S. Teoh wrote:
 I think supporting the multi-argument version of to!string() is 
 a good thing, but what to do with library code that calls 
 to!string()? It'd be nice if we could somehow redirect those GC 
 calls without having to comb through the entire Phobos codebase 
 for stray calls to to!string().

Let's consider what kinds of allocations we have. We can break them up into two broad groups: internal and visible. Internal allocations, in theory, don't matter. These can be on the stack, the gc heap, malloc/free, whatever. The function itself is responsible for their entire lifetime. Changing these either optimize, in the case of reusing a region, or leak if you switch it to manual and the function doesn't know it. Visible allocations are important because the caller is responsible for freeing them. Here, I really think we want the type system's help: either it should return something that we know we're responsible for, or take a buffer/output range from us to receive the data in the first place. Either way, the function signature should reflect what's going on with visible allocations. It'd possibly return a wrapped type and it'd take an output range/buffer/allocator. With internals though, the only reason I can see why you'd want to change them outside the function is to give them a region of some sort to work with, especially since you don't know for sure what it is doing - these are all local variables to the function/call stack. And here, I don't think we want to change the allocator wholesale. At most, we'd want to give it hints that what we're doing are short lived. (Or, better yet, have it figure this out on its own, like a generational gc.) So I think this is more about tweaking the gc than replacing it, at most adding a couple new functions to it: GC.hint_short_lived // returns a helper struct with a static refcount: TempGcAllocator { static int tempCount = 0; static void* localRegion; this() { tempCount++; } // pretend this works ~this() { tempCount--; if(tempCount == 0) gc.tryToCollect(localRegion); } T create(T, Args...)(Args args) { return GC.new_short_lived T(args); } } and gc.tryToCollect() does a quick scan for anything into the local region. If there's nothing in there, it frees the whole thing. If there is, in the name of memory safety, it just reintegrates that local region into the regular memory and gc's its components normally. The reason the count is static is that you don't have to pass this thing down the call stack. Any function that wants to adapt to this generational hint system just calls hint_short_lived. If you're a leaf function, that's ok, the static count means you'll inherit the region from the function above you. You would NOT use this in main(), as that defeats the purpose.
 I think to() with an output range parameter definitely
 should be implemented.

No doubt about it, we should aim for most phobos functions not to allocate at all, if given an output range they can use.
 Interesting idea. So basically you can tell which allocator was 
 used to allocate an object just by looking at its type?

Right, then you'll know if you have to free() it. (Or it can free itself with its destructor.)
 This is a bit inconvenient. So your member variables will have 
 to know what allocation type is being used. Not the end of the
 world, of course, but not as pretty as one would like.

Yeah, you'd need to know if you own them or not too (are you responsible for freeing that string you just got passed? If no, are you sure it won't be freed while you're still using it?), but I just think that's a part of memory management you can't sidestep. There's two easy answers: 1) always make a private copy of anything you store (and perhaps write to) or 2) use a gc and trust it to always be the owner. In any other case, I think you *have* to think about it, and the type telling you can help you make that decision.
 and allows you to mix differently-allocated objects without 
 having to

Important to remember though that you are borrowing these references, not taking ownership. I think the rule of all pointers/slices are borrowed is fairly workable though. With the gc, that's ok, you don't own anything. The garbage collector is responsible for it all, so store away. (Though if it is mutable, you might want to idup it so you don't get overwritten by someone else. But that's a separate question from allocation method.... and already encoded in D's type system). So never free() a naked pointer, unless you know what you're doing like interfacing with a C library, prefer to only free a ManuallyAllocated!(pointer). hell a C library binding could change the type too, it'd still be binary compatible. RefCounted!T wouldn't be, but ManuallyAllocated!T would just be a wrapper around T*. I think I'm starting to ramble!
Jun 26 2013