www.digitalmars.com         C & C++   DMDScript  

D - Closures Part II - local variables outliving function activations

reply la7y6nvo shamko.com writes:
To repeat myself from an earlier posting -

    People who are used to stack-based local variable frames may be
    surprised by the idea that local variables might outlive the function
    activation.  Rather than get into that in too much detail [in that
    posting] I think a follow-up post is a better idea.  Expect that
    follow-up post shortly.

This posting is that follow-up.

There are two issues.  Issue one, is it useful for closures to have
access to local variables?  Issue two, assuming they do, how should
closures behave when they access local variables whose containing
function activations have exited?



Issue one:  Is it useful for closures to have access to local
variables?  (The access referred to is real shared access, not access
by value or some sort of IN/OUT or value/result semantics.)

My response is, it is useful, and here's why.  First, it comes up in
real use, not on every closure certainly, but often enough so that
there needs to be some way of sharing local state.  Second, closures
tend to be short, so using some other mechanism that adds code would
make a substantial difference.  Third, natural expression - using
local variables inside closures looks natural and also produces
expectable results, much the same way as the functions (methods)
inside a class affect the instance variables of the object that's an
instance of the class.



Issue two:  How should closures behave when they access local
variables whose containing function activation has exited?

There are two main possibilities:  one, same as taking the address of
something on the stack (which is to say, a runtime error in the best
case, and random memory accesses in the more typical case);  two, if
these variables are allocated so that they can outlive the function
activation (and are collected by gc), the behavior is just the same as
the case where the function activation has not exited.

People who program in C or C++ will probably expect (one).  In spite
of this I claim (two) is better, for the following reasons.

First, D is moving in the direction of more benign semantics, for
example garbage collection and array bounds checking.  So providing
behavior (two) is consistent with where D is going.

Second, people who are expecting (two) and get (one) will have to deal
with what is probably a nasty bug.  On the other hand, people who are
expecting (one) and get (two) will be pleasantly surprised that the
"anomalous" condition is handled in a way that works.  It's nicer
to be pleasantly surprised than unpleasantly surprised.

Third, the rule for how closures work is now very simple - they work
the same whether the surrounding function activation has exited or
not.  Simplifying the programming model means simplifying programming.


Editorial comment:
==================

The situation with long-lasting local variable lifetimes is similar to
the situation with garbage collection (and indeed they are related).
At first it seems a little unnatural, because it's not what one is
used to;  but after having it, it's hard to imagine living without.

(End of editorial comment.)
Nov 07 2001
next sibling parent reply Axel Kittenberger <axel dtone.org> writes:
 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

I don't understand a bit what you're talking all the time about, how can a closure outlife it's mother function? Is it an own thread? - Axel
Nov 07 2001
parent la7y6nvo shamko.com writes:
A closure can outlive its mother function in the same way that
the address of a local variable can outlive the stack frame of
the mother function - by being stored in a variable that
has a longer lifetime than the mother function stack frame.
And in fact the mechanism for dealing with local variables used
by closures could also be used for any local variable that had
its address taken, so that no dangling-pointer-to-stack-variables
could occur.  I think it would be nice if the compiler would
support an option like that, but that's a whole other digression.

Axel Kittenberger <axel dtone.org> writes:

 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

I don't understand a bit what you're talking all the time about, how can a closure outlife it's mother function? Is it an own thread? - Axel

Nov 07 2001
prev sibling next sibling parent reply "Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:
<la7y6nvo shamko.com> wrote in message
news:s7c8zdiubo6.fsf michael.shamko.com...

 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

 There are two main possibilities:  one, same as taking the address of
 something on the stack (which is to say, a runtime error in the best
 case, and random memory accesses in the more typical case);  two, if
 these variables are allocated so that they can outlive the function
 activation (and are collected by gc), the behavior is just the same as
 the case where the function activation has not exited.

I wonder how can this be made technically? Since locals are on stack, returning from the function makes them invalid. Any other mechanism for storing them (like, say, a separate RTL-maintained stack) would be waay slower.
Nov 07 2001
next sibling parent "Robert W. Cunningham" <rwc_2001 yahoo.com> writes:
Pavel \"EvilOne\" Minayev wrote:

 <la7y6nvo shamko.com> wrote in message
 news:s7c8zdiubo6.fsf michael.shamko.com...

 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

 There are two main possibilities:  one, same as taking the address of
 something on the stack (which is to say, a runtime error in the best
 case, and random memory accesses in the more typical case);  two, if
 these variables are allocated so that they can outlive the function
 activation (and are collected by gc), the behavior is just the same as
 the case where the function activation has not exited.

I wonder how can this be made technically? Since locals are on stack, returning from the function makes them invalid. Any other mechanism for storing them (like, say, a separate RTL-maintained stack) would be waay slower.

Ok, I'll take a stab at it, but in a decidedly non-thread-safe and non-OO way... char *utoa( unsigned int i ) { // An unsigned int can be 128 bits or more on some systems. Allow for it, and for the trailing null static char buf[1 + sizeof(unsigned int) * 3]; char *head, *tail, temp; tail = head = &buf[0]; // Get the digits in low-to-high order while (i > 0) { *tail++ = '0' + (i%10); i /= 10; } // Append a null and point to the last digit. *tail-- = '\0'; // Place the digits in the right order. while (head < tail) { temp = *head; *head++ = *tail; *tail-- = temp; } return &buf[0]; } Yes, it is almost like one of our favorite functions, itoa(), doing what it should. But the static buffer it allocates cannot ever be free()ed or GC'ed until it can be guaranteed that the function will not be called again. How would/could this situation be better handled via closures, instead of via "static"? Should local variables ever outlast their scope without an explicit declaration to permit it (and guarantee it)? Of course, in D we could always simply change the declaration of buf[] to something like this: char *buf; buf = new char[(sizeof(unsigned int) * 3]; And then simply let the GC handle the free() for us in the context of the caller, right? Or am I missing something here? -BobC
Nov 07 2001
prev sibling parent reply la7y6nvo shamko.com writes:
The storage for such local variables would be dynamically allocated
using the regular storage allocator.  Yes, it would be slower than
just incrementing a stack pointer, but not a lot slower than just
a regular function call - storage allocators are pretty good at
running quickly these days.

"Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:

 <la7y6nvo shamko.com> wrote in message
 news:s7c8zdiubo6.fsf michael.shamko.com...
 
 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

 There are two main possibilities:  one, same as taking the address of
 something on the stack (which is to say, a runtime error in the best
 case, and random memory accesses in the more typical case);  two, if
 these variables are allocated so that they can outlive the function
 activation (and are collected by gc), the behavior is just the same as
 the case where the function activation has not exited.

I wonder how can this be made technically? Since locals are on stack, returning from the function makes them invalid. Any other mechanism for storing them (like, say, a separate RTL-maintained stack) would be waay slower.

Nov 07 2001
parent reply "Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:
<la7y6nvo shamko.com> wrote in message
news:s7cvggmrkt1.fsf michael.shamko.com...

 The storage for such local variables would be dynamically allocated
 using the regular storage allocator.  Yes, it would be slower than
 just incrementing a stack pointer, but not a lot slower than just
 a regular function call - storage allocators are pretty good at
 running quickly these days.

Everything is not that simple though. Since functions are re-entrant, you can't just allocate locals on heap - you have to manage a stack for them yourself (which is probably slower). This is another source of problems: suppose we have function A called 3 times, so now there's three copies of its locals on the stack. The last time we called it, it returned, directly or indirectly, a closure referencing one of its locals, so we can't clean the stack, at least not completely. SO WE HAVE ALL THE LOCALS GENERATED BY PREVIOUS FUNCTION CALLS HANGING THERE IN THE MEMORY WASTING SPACE - without knowing when to release them (since closure has outlived the function and can be activated at any point of the program)! Now what if the function is recursive and calls itself a thousand times? If all locals are preserved, not only those that are referenced from closures, as you suggest, than this problem will arise with each and every function. To make it simpler - since every function has its own copy of locals, and since it can possibly return a pointer to one of locals, and since locals should be able to outlive the function, we cannot clear the stack when leaving the function at all! ...am I missing something?
Nov 07 2001
parent reply la7y6nvo shamko.com writes:
Only the heap-allocated locals that are referenced by a closure
that is still accessible need be kept.  Any others would be
cleaned up just like any other unreachable object.  Also any
local variables _not_ referenced by closures can be allocated on
the stack and pruned in the usual way.  No stack frames need
be kept.

Does this answer your question?

"Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:

 <la7y6nvo shamko.com> wrote in message
 news:s7cvggmrkt1.fsf michael.shamko.com...
 
 The storage for such local variables would be dynamically allocated
 using the regular storage allocator.  Yes, it would be slower than
 just incrementing a stack pointer, but not a lot slower than just
 a regular function call - storage allocators are pretty good at
 running quickly these days.

Everything is not that simple though. Since functions are re-entrant, you can't just allocate locals on heap - you have to manage a stack for them yourself (which is probably slower). This is another source of problems: suppose we have function A called 3 times, so now there's three copies of its locals on the stack. The last time we called it, it returned, directly or indirectly, a closure referencing one of its locals, so we can't clean the stack, at least not completely. SO WE HAVE ALL THE LOCALS GENERATED BY PREVIOUS FUNCTION CALLS HANGING THERE IN THE MEMORY WASTING SPACE - without knowing when to release them (since closure has outlived the function and can be activated at any point of the program)! Now what if the function is recursive and calls itself a thousand times? If all locals are preserved, not only those that are referenced from closures, as you suggest, than this problem will arise with each and every function. To make it simpler - since every function has its own copy of locals, and since it can possibly return a pointer to one of locals, and since locals should be able to outlive the function, we cannot clear the stack when leaving the function at all! ...am I missing something?

Nov 08 2001
parent reply "Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:
<la7y6nvo shamko.com> wrote in message
news:s7csnbpsmx5.fsf michael.shamko.com...

 Only the heap-allocated locals that are referenced by a closure
 that is still accessible need be kept.  Any others would be
 cleaned up just like any other unreachable object.  Also any
 local variables _not_ referenced by closures can be allocated on
 the stack and pruned in the usual way.  No stack frames need
 be kept.

Does this mean that all locals that are referenced by a closure automatically become static (since their only instance reside on heap)? Or will there be a separate stack for them (again, on heap)? BTW another problem: compiler can't always be sure that closure gets returned from the function on a specific call - if, for example, it's in conditional statement. If closure references, say, a local string, and function is called several times, but actually returns closure only once, all instances of that local string are preserved till the end of the program...
Nov 08 2001
parent reply a <a b.c> writes:
Pavel \"EvilOne\" Minayev wrote:
 
 <la7y6nvo shamko.com> wrote in message
 news:s7csnbpsmx5.fsf michael.shamko.com...
 
 Only the heap-allocated locals that are referenced by a closure
 that is still accessible need be kept.  Any others would be
 cleaned up just like any other unreachable object.  Also any
 local variables _not_ referenced by closures can be allocated on
 the stack and pruned in the usual way.  No stack frames need
 be kept.

Does this mean that all locals that are referenced by a closure automatically become static (since their only instance reside on heap)? Or will there be a separate stack for them (again, on heap)?

They would be in the heap I would presume. Once the instance of the closure the references them goes away (is no longer referenced itself) these variables would be disposed of by GC. The variable could probably be allocated and deallocated as a group.
 BTW another problem: compiler can't always be sure that closure
 gets returned from the function on a specific call - if, for
 example, it's in conditional statement. If closure references,
 say, a local string, and function is called several times, but
 actually returns closure only once, all instances of that local
 string are preserved till the end of the program...

No. The closure is effectively a reference to a chunk of code and the chunk of data. When all references to it go away, the chunk of data (and all references to said string) are destroyed by the GC. It really isn't that complex to follow. The complex part would be how the compiler decides what goes on the stack (not referenced by any closure) what goes into heap allocated closure blocks. I bet this could get harry if you declare several different closures in a given scope. int f(){ int a, b, c; int F1() = []{print (a, b);}; while(elsewhere()){ int a, d; F1 = [] {print(a, b, d); } int d; F1 = []{print(a, b, d); } The easy way would be to allocate each shared variable separately, but that could be very sub-optimal. Optimization would be possible but they would probably be a great source of implementation errors. For what it's worth, I would love to see anonymous functions in some capacity even if we don't get full fledged closures. The closer to closures the more useful they will be though. Dan
Nov 08 2001
parent la7y6nvo shamko.com writes:
a <a b.c> writes:

 The closure is effectively a reference to a chunk of code and the
 chunk of data.  When all references to it go away, the chunk of data
 (and all references to said string) are destroyed by the GC.  It really
 isn't that complex to follow.  The complex part would be how the
 compiler decides what goes on the stack (not referenced by any closure)
 [and] what goes into heap allocated closure blocks.  I bet this could 
 get harry if you declare several different closures in a given scope.

The rule is simple - any variable that is accessed by any closure would go in the (single) environment object for that function activation. In cases like the function below the storage for the variables in the nested (while) block would be pulled up into the containing storage area. This approach is well understood and used in compilers all the time. Yes?
 	int f(){
 		int a, b, c;
 		int F1() = []{print (a, b);};
 		while(elsewhere()){
 			int a, d;
 			F1 = [] {print(a, b, d);
 		}
 		int d;
 		F1 = []{print(a, b, d);
 	}

Nov 09 2001
prev sibling parent reply "Sean L. Palmer" <spalmer iname.com> writes:
 Issue one:  Is it useful for closures to have access to local
 variables?  (The access referred to is real shared access, not access
 by value or some sort of IN/OUT or value/result semantics.)

 My response is, it is useful, and here's why.  First, it comes up in
 real use, not on every closure certainly, but often enough so that
 there needs to be some way of sharing local state.  Second, closures
 tend to be short, so using some other mechanism that adds code would
 make a substantial difference.  Third, natural expression - using
 local variables inside closures looks natural and also produces
 expectable results, much the same way as the functions (methods)
 inside a class affect the instance variables of the object that's an
 instance of the class.

I agree, they should be able to access local variables of the enclosing function. I don't really think a closure should be able to declare any variables which are visible to the enclosing function. (much like local functions)
 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

 There are two main possibilities:  one, same as taking the address of
 something on the stack (which is to say, a runtime error in the best
 case, and random memory accesses in the more typical case);  two, if
 these variables are allocated so that they can outlive the function
 activation (and are collected by gc), the behavior is just the same as
 the case where the function activation has not exited.

 People who program in C or C++ will probably expect (one).  In spite
 of this I claim (two) is better, for the following reasons.

 First, D is moving in the direction of more benign semantics, for
 example garbage collection and array bounds checking.  So providing
 behavior (two) is consistent with where D is going.

 Second, people who are expecting (two) and get (one) will have to deal
 with what is probably a nasty bug.  On the other hand, people who are
 expecting (one) and get (two) will be pleasantly surprised that the
 "anomalous" condition is handled in a way that works.  It's nicer
 to be pleasantly surprised than unpleasantly surprised.

 Third, the rule for how closures work is now very simple - they work
 the same whether the surrounding function activation has exited or
 not.  Simplifying the programming model means simplifying programming.

I almost disagreed with you, but after rethinking I think you're right. Closures would be much less powerful without being able to outlive the enclosing function. If only there were some way of determining whether a closure was safe to let use the normal stack frame, or if it needed one allocated from the heap! I can't think of a good way without somehow having to declare closure-type parameters to a function as "storable" in much the way that "const" works in C++. Perhaps the compiler can keep a record for each function whether its closure parameters are being treated in an unsafe storable manner or not, and generate the heap stack frame only in those cases. "storable" use being storing the closure argument to any location external to the function or sending the closure to any function which takes a storable closure as parameter. Perhaps "callback" would be a better word for this. Anyway if the symbol table is descriptive enough this could be done in most cases with no heap overhead so long as the closure is used carefully to satisfy these criteria. Anyway I don't think a small heap allocation before iterating through 5000 objects will slow an application appreciably. It is a tradeoff I'd gladly make in most situations where closures are useful. The size is likely small, the lifetime short and, unless stored, predictable. The compiler could very well use a reference counted pointer to maintain this memory instead of GC if it wanted. Question: How would a closure declared in a method of an object be handled? I assume a "this" pointer to the object would be maintained in the closure's stack frame.
Nov 08 2001
parent la7y6nvo shamko.com writes:
 Question:  How would a closure declared in a method of an object be handled?
 I assume a "this" pointer to the object would be maintained in the closure's
 stack frame.

Right, except I think you mean the closure's environment object. The stack frame is created only when the closure is actually invoked.
 I don't really think a closure should be able to declare any
 variables which are visible to the enclosing function. (much like local
 functions)

Complete agreement.
 If only there were some way of determining whether a
 closure was safe to let use the normal stack frame, or if it needed one
 allocated from the heap!  [remainder omitted].

I have to admit I didn't completely understand the suggestion. In time something like this might be important to implement. At first I think it's ignorable as a second-order performance issue.
 Anyway I don't think a small heap allocation before iterating through 5000
 objects will slow an application appreciably.  It is a tradeoff I'd gladly
 make in most situations where closures are useful.  

Right, that's the point I was trying to make above.
Nov 09 2001