www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Memory Safety without a GC or Ref Counting

reply "F i L" <witte2008 gmail.com> writes:
So I've been thinking about this problem off and on for awhile, 
and I think I have a rather simple solution. I highly doubt I'm 
the first to think of it, but haven't found any info about a 
similar idea on the net. So in the interest of science I'm 
posting it here so anyone interested can attempt to poke holes in 
the idea, which you folks have been so good at in the past.

This isn't really D related, but I usually like the feedback I 
get from this community, plus, if there's any value in the idea, 
maybe it would be a worth while discussion for future directions 
D might make use of.

I'll be illustrating with Pseudo-code. The basic idea spawns from 
conceptually separating variables as either "memory owners" or 
"memory references". Basically, the pseudo-code has three types:

var - Memory Owner, never null, deleted at end of scope**
ref - Memory Reference, can't "own" memory
ptr - Unsafe manually managed memory pointer (like C pointers)

So, 'ptr's aside for now, 'var's would be the only way to 
allocate memory, and memory is automatically allocated to them at 
their definition**. Vars would be deleted at the end of their 
scope (or when they're removed from dynamic-arrays, etc) unless 
they're returned, in which case they're "injected" into the 
callers scope, and deleted at the end of that scope. Without too 
much lengthy explanation, here's some code:

     type Foo
     {
         var bar = 0
     }

     func main
     {
         ref f : Foo
         ref b : int

         scope
         {













         }



         var n = 6

         b => n







     }

So this is sort of like ref-counted scopes (smart scopes), except 
the compiler has a higher degree of guarantee on the lifetime of 
memory, and there's a lot of static analysis that can happen to 
optimize the clean up code (like how we didn't need to assign 'b' 
to null in the example above). All "ref counting" would be done 
through static-analysis at compile-time, and injected by the 
compiler, so it wouldn't have the runtime overhead of 
ref-counting, nor would there be a need for "weak references" 
since references are guaranteed to never "own" memory.

There's more to it of course. For instance, in order to have 
expanding memory, you'd need the concepts of a dynamic-array and 
type inheritance. You'd also need to not clean up memory that was 
returned... so...

     type Foo
     {
         var name : text
         init { name = "Foo" }
         func write { Console.write(name) }
     }

     type Bar : Foo
     {
         init { name = "Bar" }
     }

     ref global_foo : Foo

     func main
     {
         var foos = Foo[]











         func getFoo
         {
             var f = Foo
             f.name = "Fip"
             return f

         }

         global_foo = getFoo()








     }

Okay, so with that you could have global var arrays, and assign 
global refs to function return values, etc. If you need extra 
control, there's also 'ptr' types, which would require manual 
clean-up. However, 'ptr's would be automatically cleaned when the 
program exist, so they would be safe to use for global references 
which only sometimes needed to point to usable memory.

Anyways.. comments and criticisms would be great. Sorry this 
isn't more D related. Like I said, I hope it does have value D 
might make use of at some point.
Jan 21 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Jan 21, 2013 at 06:37:36PM +0100, F i L wrote:
 So I've been thinking about this problem off and on for awhile, and
 I think I have a rather simple solution. I highly doubt I'm the
 first to think of it, but haven't found any info about a similar
 idea on the net. So in the interest of science I'm posting it here
 so anyone interested can attempt to poke holes in the idea, which
 you folks have been so good at in the past.
 
 This isn't really D related, but I usually like the feedback I get
 from this community, plus, if there's any value in the idea, maybe
 it would be a worth while discussion for future directions D might
 make use of.
 
 I'll be illustrating with Pseudo-code. The basic idea spawns from
 conceptually separating variables as either "memory owners" or
 "memory references". Basically, the pseudo-code has three types:
 
 var - Memory Owner, never null, deleted at end of scope**
 ref - Memory Reference, can't "own" memory
 ptr - Unsafe manually managed memory pointer (like C pointers)
[...] I invented this scheme (independently -- I'm sure the concept has been thought of before) way back when I was coding in C and C++. Of course, I didn't have language/compiler support, but I did adopt a convention of marking every pointer with [O] (for owner) and [R] (for reference) in comments, in all my code. Owner pointers can never be assigned a reference pointer, and owner pointers can only be destructively copied (the original must be set to NULL after copying). Owner pointers must be free()'d if they are non-NULL and going out of scope (unless they are returned from the function, in which case I consider it as a conceptual destructive copy followed by NULLing). I did allow owner pointers to be NULL -- mainly because it is required for destructive copy to be sound. In newer versions of C++ (I invented this scheme back in the 90's when templates and STL were only starting to emerge), owner pointers are essentially equivalent to STL's auto_ptr<T>. This was essentially as far as I developed the concept -- and probably it's as far as it will get, because the concept has limitations: 1) It required code to be manually written to deal with owner pointers correctly (auto_ptr mostly made this automatic, but it is certainly NOT a drop-in replacement for pointers); 2) It can't deal with cyclic structures correctly; 3) It is still unsafe: you can have a dangling reference to owned memory, because the owner pointer goes out of scope and the memory gets deallocated, but there can still be references lingering around somewhere. IOW, it doesn't solve the problem of memory management. It is one level up from having no distinction whatsoever between pointers (and scarily enough, I've seen "enterprise" code that was obviously written without any such distinction in mind), for sure, but it's not much more than that. T -- MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs
Jan 21 2013
parent reply "F i L" <witte2008 gmail.com> writes:
F i L wrote:
 [...code...]
     global_foo = getFoo()
 [...code...]
Typo, that should have been: global_foo => getFoo() H. S. Teoh wrote:
 [...] because the concept has limitations:

 1) It required code to be manually written to deal with owner 
 pointers
 correctly (auto_ptr mostly made this automatic, but it is 
 certainly NOT
 a drop-in replacement for pointers);
I don't do a whole lot of C++, so I don't know a lot about their smart_ptr types, however, I doubt you could do anything like my idea in C++ (without a lot of extra syntax like you described) since it requires a lot of automatic per-scope bookkeeping I doubt C++ types alone could accomplish.
 2) It can't deal with cyclic structures correctly;
That's why there would be a 'ptr' type :)
 3) It is still unsafe: you can have a dangling reference to 
 owned
 memory, because the owner pointer goes out of scope and the 
 memory gets
 deallocated, but there can still be references lingering around
 somewhere.
This is why there would need to be a lot of magic happening in the compiler. When the compiler can't know, for sure, weather a reference that was assigned to a local-scope var is *still* referencing that var at the end of the scope (pretty common), then it asks before setting assigning to null. If the compiler can't determine, at all, which refs will be pointing to a particular local-var, it falls back to regular ref-counting (for the var in question). I've given it a bit of thought, and while runtime checks and even regular ref-counting wouldn't be too uncommon, due to it's var/ref design there would be a great many places where the compiler could optimize away all the runtime checks and you normally wouldn't have to deal with the cyclic-ref problem ref-counting memory-management usually bring to the table (granted ptr's aren't much better). Also, I can't remember exactly, but because var's are tied to scopes, I think there's more optimization that can happen with ref-counting because it happens at the scope level... more thoughts needed here though. One benefit of this I forgot to mention is that, like ref-counting, memory is deterministic (with optional late-collection). That wou Thanks for your feedback! Arne wrote:
 I haven´t found any hole or improvement suggestion yet, my only 
 comment is, I`d really love to use a system like this.
Me too! Let me know if you think of any flaws. Jacob Carlborg wrote:
 Seems a bit like ARC (Automatic Reference Counting) that Clang 
 has implemented for Objective-C (and some C libraries on Mac OS 
 X). The compiler automatically inserts calls to "retain" and 
 "release" where appropriate.
Interesting, I will have to look into that. Thanks for the info.
Jan 21 2013
next sibling parent "F i L" <witte2008 gmail.com> writes:
F i L wrote:
 ... Also, I can't remember exactly, but because var's are tied 
 to scopes, I think there's more optimization that can happen 
 with ref-counting because it happens at the scope level... more 
 thoughts needed here though.
Ah right I remember. Since vars only ever get collected at the end of a scope, you don't need to check for zero-refs on every dereference, only at the end of the var's scope.
Jan 21 2013
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Jan 21, 2013 at 09:27:20PM +0100, F i L wrote:
[...]
 H. S. Teoh wrote:
[...]
3) It is still unsafe: you can have a dangling reference to owned
memory, because the owner pointer goes out of scope and the memory
gets deallocated, but there can still be references lingering around
somewhere.
This is why there would need to be a lot of magic happening in the compiler. When the compiler can't know, for sure, weather a reference that was assigned to a local-scope var is *still* referencing that var at the end of the scope (pretty common), then it asks before setting assigning to null. If the compiler can't determine, at all, which refs will be pointing to a particular local-var, it falls back to regular ref-counting (for the var in question).
[...] Hmm. So you're essentially saying that the compiler will do compile-time analysis of variable usage, and based on that choose the simplest memory management model for it? That's not a bad idea. You can have a series of increasingly complex management techniques, and the compiler could choose the least complex one that it can statically prove is safe. So if something can be proven not to escape the current scope, it can even be stack-allocated (or if it's big, put it on the manual heap and delete it on scope exit). If it is returned with no other references to it, then it can be a single-reference pointer (i.e. your owner type). If it has cyclic references, use a reference-counted pointer. If all else fails, fall back to the GC. The problem is that quite often, the compiler will not be able to prove much about the data structure, esp. when you have non-trivial manipulation of pointers -- it requires solving the halting problem to achieve that level of analysis on arbitrary code -- so it will probably fall back to the GC quite often, depending on what the code does. T -- My program has no bugs! Only undocumented features...
Jan 21 2013
next sibling parent reply Paulo Pinto <pjmlp progtools.org> writes:
Am 21.01.2013 22:25, schrieb H. S. Teoh:
 On Mon, Jan 21, 2013 at 09:27:20PM +0100, F i L wrote:
 [...]
 H. S. Teoh wrote:
[...]
 3) It is still unsafe: you can have a dangling reference to owned
 memory, because the owner pointer goes out of scope and the memory
 gets deallocated, but there can still be references lingering around
 somewhere.
This is why there would need to be a lot of magic happening in the compiler. When the compiler can't know, for sure, weather a reference that was assigned to a local-scope var is *still* referencing that var at the end of the scope (pretty common), then it asks before setting assigning to null. If the compiler can't determine, at all, which refs will be pointing to a particular local-var, it falls back to regular ref-counting (for the var in question).
[...] Hmm. So you're essentially saying that the compiler will do compile-time analysis of variable usage, and based on that choose the simplest memory management model for it? That's not a bad idea. You can have a series of increasingly complex management techniques, and the compiler could choose the least complex one that it can statically prove is safe. So if something can be proven not to escape the current scope, it can even be stack-allocated (or if it's big, put it on the manual heap and delete it on scope exit). If it is returned with no other references to it, then it can be a single-reference pointer (i.e. your owner type). If it has cyclic references, use a reference-counted pointer. If all else fails, fall back to the GC. The problem is that quite often, the compiler will not be able to prove much about the data structure, esp. when you have non-trivial manipulation of pointers -- it requires solving the halting problem to achieve that level of analysis on arbitrary code -- so it will probably fall back to the GC quite often, depending on what the code does. T
I see how this could work in a JIT environment, but not in a static compiler with modules compiled in separate ways. -- Paulo
Jan 21 2013
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Jan 21, 2013 at 11:36:58PM +0100, Paulo Pinto wrote:
 Am 21.01.2013 22:25, schrieb H. S. Teoh:
[...]
Hmm. So you're essentially saying that the compiler will do
compile-time analysis of variable usage, and based on that choose the
simplest memory management model for it?

That's not a bad idea. You can have a series of increasingly complex
management techniques, and the compiler could choose the least
complex one that it can statically prove is safe. So if something can
be proven not to escape the current scope, it can even be
stack-allocated (or if it's big, put it on the manual heap and delete
it on scope exit). If it is returned with no other references to it,
then it can be a single-reference pointer (i.e. your owner type). If
it has cyclic references, use a reference-counted pointer. If all
else fails, fall back to the GC.

The problem is that quite often, the compiler will not be able to
prove much about the data structure, esp. when you have non-trivial
manipulation of pointers -- it requires solving the halting problem
to achieve that level of analysis on arbitrary code -- so it will
probably fall back to the GC quite often, depending on what the code
does.
[...]
 I see how this could work in a JIT environment, but not in a static
 compiler with modules compiled in separate ways.
[...] It can be made to work with separately-compiled modules if there's language support (say in the form of exported pointer attributes). But probably not feasible in the current version of D. T -- Designer clothes: how to cover less by paying more.
Jan 21 2013
prev sibling parent reply "F i L" <witte2008 gmail.com> writes:
H. S. Teoh wrote:
 The problem is that quite often, the compiler will not be able 
 to prove
 much about the data structure, esp. when you have non-trivial
 manipulation of pointers -- it requires solving the halting 
 problem to
 achieve that level of analysis on arbitrary code -- so it will 
 probably
 fall back to the GC quite often, depending on what the code 
 does.
Well I misspoke on exactly what I meant by "falling back to ref-counting".. I actually don't think you'd need anything like a typical GC's or Ref-counting system, since this memory-model is so rigid it gives a bit more guarantee on when something is allocated and deleted, therefor when references need to be cleaned can be a bit more optimized (at least I think). For any local variables, most of the cleanup code can be statically injected (like I illustrated above). Really the only area you need to dynamically track and "nullify" references is for dynamically allocating objects, like DynamicArrays. For them we can fall-back on a sort of "Garbage Collection", however I think it can be better (doesn't need cyclic detection, is deterministic, etc). Let me illustrate... In the compiler, our DynamicArray type might look like: { ptr pointer : T var length : type(ptr) ptr refs = MemoryPool(ref T[]) func => (index, r:ref T) { refs[index] += r } func -= (item) { ...remove item... var rfs = refs[getIndex(item)] each r in rfs { if r ==> item { r => null } } refs -= rfs } } Basically, syntax like: var foos = Foo[] is just sugar for: var foos = DynamicArray(Foo) You'll notice that the DynamicArray type has a reference 'refs' to a MemoryPool of type 'ref T[]'. This MemoryPool object would probably be shared across multiple/all DynamicArrays for optimization, but for now just think of it as a simple dynamic-array-of-arrays itself. The functions: '=>' & '-=' are operators, and happen whenever a variable is referenced to an item, and when an item is removed, respectively. To illustrate: ref a, b : int func main { each i in nums { if runtime_condition { } } } So while this would search through a list of refs every time an item was removed, it wouldn't have to do any complex cyclic detection, or other such things. The MemoryPools could be a universally managed heap of references that each DynamicArray only held a slice-reference to, therefor allocating arrays would still be cheap. That's more of what I meant by falling back to "Ref-counting" (more like reverse-referencing) type systems in cases where you need them. Combined with all the optimizations that could occur with local variables, I think a system like this could have a lot of potential as an alternative to a thread-locking GC. Still... I have this feeling like there's some obvious flaw in the design I just can't see yet...
Jan 24 2013
next sibling parent reply "TommiT" <tommitissari hotmail.com> writes:
On Monday, 21 January 2013 at 17:37:38 UTC, F i L wrote:
     type Foo
     {
         var bar = 0
     }

     func main
     {
         ref f : Foo
         ref b : int

         scope
         {













         }


I think there should be also b => null in the implicit 'auto clean up' phase. I think it's good to note whether errors are compilation errors or runtime errors. I assume f.bar = 4 in the end there would be a compilation error? On Friday, 25 January 2013 at 02:12:56 UTC, F i L wrote:

     ref a, b : int

     func main
     {






What if we now (after the code snippet above) write: nums += 4 And nums has run out of room in its place in the heap and needs to re-allocate its data (4 ints now). I guess that would mean that the call += should nullify all refs pointing to nums (a and b). If we now write: a = -3 ...that can't be a compilation error. Because it's not known at compile-time whether or not nums += 4 forces a re-allocation. Wouldn't this mean then, that all accesses through references would require a runtime penalty for checking whether the ref is null or not?
Jan 25 2013
parent reply "F i L" <witte2008 gmail.com> writes:
On Friday, 25 January 2013 at 11:28:16 UTC, TommiT wrote:
 On Monday, 21 January 2013 at 17:37:38 UTC, F i L wrote:
    type Foo
    {
        var bar = 0
    }

    func main
    {
        ref f : Foo
        ref b : int

        scope
        {













        }


I think there should be also b => null in the implicit 'auto clean up' phase.
b didn't need to be assigned to null because the compiler could determine it wasn't being directly accessed after that scope (it was assigned to a new var first). It was intentionally left out to illustrate how the compiler could optimize special cases like that. I wrote that in my original post a bit after that code.
 I think it's good to note whether errors are compilation errors 
 or runtime errors. I assume f.bar = 4 in the end there would be 
 a compilation error?
If the compiler could determine that a ref was being used directly after being cleaned up, then yes, the compiler could throw an error. But that can't happen in every case, so when it can't you'll get a runtime error. Sorry I wasn't clear about that.
 On Friday, 25 January 2013 at 02:12:56 UTC, F i L wrote:

    ref a, b : int

    func main
    {






What if we now (after the code snippet above) write: nums += 4 And nums has run out of room in its place in the heap and needs to re-allocate its data (4 ints now). I guess that would mean that the call += should nullify all refs pointing to nums (a and b). If we now write: a = -3 ...that can't be a compilation error. Because it's not known at compile-time whether or not nums += 4 forces a re-allocation. Wouldn't this mean then, that all accesses through references would require a runtime penalty for checking whether the ref is null or not?
The specific code you're talking about is a demonstration of an area where the compiler can't (like you said) determine at compile-time the clean-up code, and therefor has to use a runtime approach to nullifying references. The issue you raise isn't really a problem though. If the memory gets rearranged then the runtime could inform the DynamicArray, and it, in turn, could reassign those refs to the new memory address. The idea here is that if you're writing something complex like a Container, you may need to use the unmanaged 'ptr' type to get the job done. However, such containers could still make safety guarantees to their users. So if the standard libs provided enough basic building blocks, and the compiler supported advanced meta-programming features (like in D), then programmers would rarely, if ever, need to touch a 'ptr' to make applications.
Jan 25 2013
parent "TommiT" <tommitissari hotmail.com> writes:
On Friday, 25 January 2013 at 12:05:42 UTC, F i L wrote:
 The issue you raise isn't really a problem though. If the 
 memory gets rearranged then the runtime could inform the 
 DynamicArray, and it, in turn, could reassign those refs to the 
 new memory address.
Oh, of course. By the way, what exactly are those things pointed to by the refs member of DynamicArray? Are they references/pointers to reference variables, or something? One thing I don't like this design is that even if I never make any ref variables that point to the elements of a certain DynamicArray arr, that arr still has to have the member data refs, and has to do some extra work inside func-=(item).
Jan 25 2013
prev sibling parent reply "TommiT" <tommitissari hotmail.com> writes:
What if we re-assign a ref variable to another (or same) element 
of a DynamicArray:


ref foo : int

nums += 1
nums += 2


foo = 42


Given the definition of DynamicArray's operator =>:

func => (index, r:ref T)
{
     refs[index] += r
}

...it would mean that refs would hold two instances foo (at 
different indices).

Also, it looks like r is passed as a copy of a ref variable, 
which to me would indicate that saying r => null wouldn't nullify 
the original ref variable from which r was made a copy of (in 
this case ref variable foo).
Jan 25 2013
parent "F i L" <witte2008 gmail.com> writes:
On Friday, 25 January 2013 at 13:14:43 UTC, TommiT wrote:
 What if we re-assign a ref variable to another (or same) 
 element of a DynamicArray:


 ref foo : int

 nums += 1
 nums += 2


 foo = 42


 Given the definition of DynamicArray's operator =>:

 func => (index, r:ref T)
 {
     refs[index] += r
 }

 ...it would mean that refs would hold two instances foo (at 
 different indices).
Yeah I caught that bug after I originally posted the code as well. I have a solution, but it's ugly, so I'm trying to find a better one. Thanks for the input! I will let you know if I come up with anything elegant.
Jan 25 2013
prev sibling next sibling parent "Arne" <arne petterson.nu> writes:
On Monday, 21 January 2013 at 17:37:38 UTC, F i L wrote:
 This isn't really D related, but I usually like the feedback I 
 get from this community, plus, if there's any value in the 
 idea, maybe it would be a worth while discussion for future 
 directions D might make use of.
I haven´t found any hole or improvement suggestion yet, my only comment is, I`d really love to use a system like this.
Jan 21 2013
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2013-01-21 18:37, F i L wrote:
 So I've been thinking about this problem off and on for awhile, and I
 think I have a rather simple solution. I highly doubt I'm the first to
 think of it, but haven't found any info about a similar idea on the net.
 So in the interest of science I'm posting it here so anyone interested
 can attempt to poke holes in the idea, which you folks have been so good
 at in the past.
Seems a bit like ARC (Automatic Reference Counting) that Clang has implemented for Objective-C (and some C libraries on Mac OS X). The compiler automatically inserts calls to "retain" and "release" where appropriate. -- /Jacob Carlborg
Jan 21 2013
parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2013-01-21 19:22:04 +0000, Jacob Carlborg <doob me.com> said:

 On 2013-01-21 18:37, F i L wrote:
 So I've been thinking about this problem off and on for awhile, and I
 think I have a rather simple solution. I highly doubt I'm the first to
 think of it, but haven't found any info about a similar idea on the net.
 So in the interest of science I'm posting it here so anyone interested
 can attempt to poke holes in the idea, which you folks have been so good
 at in the past.
Seems a bit like ARC (Automatic Reference Counting) that Clang has implemented for Objective-C (and some C libraries on Mac OS X). The compiler automatically inserts calls to "retain" and "release" where appropriate.
And I'll confirm that it's no magic either. Avoiding cyclic references isn't that easy when you start mixing implicitly retained pointers with Objective-C's blocks (lambas in C++ or delegate literals in D). I'm a little sad Apple decided to deprecate its Objective-C garbage collector. Having the choice between GC and reference counting was great (although it surely sucks for library developers who need to support both modes). -- Michel Fortin michel.fortin michelf.ca http://michelf.ca/
Jan 21 2013
parent Jacob Carlborg <doob me.com> writes:
On 2013-01-22 01:46, Michel Fortin wrote:

 And I'll confirm that it's no magic either. Avoiding cyclic references
 isn't that easy when you start mixing implicitly retained pointers with
 Objective-C's blocks (lambas in C++ or delegate literals in D).

 I'm a little sad Apple decided to deprecate its Objective-C garbage
 collector. Having the choice between GC and reference counting was great
 (although it surely sucks for library developers who need to support
 both modes).
Yeah I agree, but ARC works for iOS as well. They never implemented the GC on iOS. Now is MacRuby is available on iOS, which is probably due to ARC. -- /Jacob Carlborg
Jan 21 2013