D - Closures Part II - local variables outliving function activations

la7y6nvo shamko.com (51/51) Nov 07 2001 To repeat myself from an earlier posting -

Axel Kittenberger (3/5) Nov 07 2001 I don't understand a bit what you're talking all the time about, how can...

la7y6nvo shamko.com (10/17) Nov 07 2001 A closure can outlive its mother function in the same way that

Pavel \"EvilOne\" Minayev (6/14) Nov 07 2001 I wonder how can this be made technically? Since locals are on stack,

Robert W. Cunningham (41/56) Nov 07 2001 Ok, I'll take a stab at it, but in a decidedly non-thread-safe and non-O...
la7y6nvo shamko.com (6/23) Nov 07 2001 The storage for such local variables would be dynamically allocated

Pavel \"EvilOne\" Minayev (21/26) Nov 07 2001 Everything is not that simple though. Since functions are re-entrant,

la7y6nvo shamko.com (8/38) Nov 08 2001 Only the heap-allocated locals that are referenced by a closure

Pavel \"EvilOne\" Minayev (11/17) Nov 08 2001 Does this mean that all locals that are referenced by a closure

a (30/50) Nov 08 2001 They would be in the heap I would presume. Once the instance of the

la7y6nvo shamko.com (7/24) Nov 09 2001 The rule is simple - any variable that is accessed by any closure

Sean L. Palmer (28/61) Nov 08 2001 I agree, they should be able to access local variables of the enclosing

la7y6nvo shamko.com (9/21) Nov 09 2001 Right, except I think you mean the closure's environment object.

la7y6nvo shamko.com writes:

To repeat myself from an earlier posting -

    People who are used to stack-based local variable frames may be
    surprised by the idea that local variables might outlive the function
    activation.  Rather than get into that in too much detail [in that
    posting] I think a follow-up post is a better idea.  Expect that
    follow-up post shortly.

This posting is that follow-up.

There are two issues.  Issue one, is it useful for closures to have
access to local variables?  Issue two, assuming they do, how should
closures behave when they access local variables whose containing
function activations have exited?



Issue one:  Is it useful for closures to have access to local
variables?  (The access referred to is real shared access, not access
by value or some sort of IN/OUT or value/result semantics.)

My response is, it is useful, and here's why.  First, it comes up in
real use, not on every closure certainly, but often enough so that
there needs to be some way of sharing local state.  Second, closures
tend to be short, so using some other mechanism that adds code would
make a substantial difference.  Third, natural expression - using
local variables inside closures looks natural and also produces
expectable results, much the same way as the functions (methods)
inside a class affect the instance variables of the object that's an
instance of the class.



Issue two:  How should closures behave when they access local
variables whose containing function activation has exited?

There are two main possibilities:  one, same as taking the address of
something on the stack (which is to say, a runtime error in the best
case, and random memory accesses in the more typical case);  two, if
these variables are allocated so that they can outlive the function
activation (and are collected by gc), the behavior is just the same as
the case where the function activation has not exited.

People who program in C or C++ will probably expect (one).  In spite
of this I claim (two) is better, for the following reasons.

First, D is moving in the direction of more benign semantics, for
example garbage collection and array bounds checking.  So providing
behavior (two) is consistent with where D is going.

Second, people who are expecting (two) and get (one) will have to deal
with what is probably a nasty bug.  On the other hand, people who are
expecting (one) and get (two) will be pleasantly surprised that the
"anomalous" condition is handled in a way that works.  It's nicer
to be pleasantly surprised than unpleasantly surprised.

Third, the rule for how closures work is now very simple - they work
the same whether the surrounding function activation has exited or
not.  Simplifying the programming model means simplifying programming.


Editorial comment:
==================

The situation with long-lasting local variable lifetimes is similar to
the situation with garbage collection (and indeed they are related).
At first it seems a little unnatural, because it's not what one is
used to;  but after having it, it's hard to imagine living without.

(End of editorial comment.)

Nov 07 2001

Axel Kittenberger <axel dtone.org> writes:

 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

I don't understand a bit what you're talking all the time about, how can a 
closure outlife it's mother function? Is it an own thread?

- Axel

Nov 07 2001

la7y6nvo shamko.com writes:

A closure can outlive its mother function in the same way that
the address of a local variable can outlive the stack frame of
the mother function - by being stored in a variable that
has a longer lifetime than the mother function stack frame.
And in fact the mechanism for dealing with local variables used
by closures could also be used for any local variable that had
its address taken, so that no dangling-pointer-to-stack-variables
could occur.  I think it would be nice if the compiler would
support an option like that, but that's a whole other digression.

Axel Kittenberger <axel dtone.org> writes:

 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

 
 I don't understand a bit what you're talking all the time about, how can a 
 closure outlife it's mother function? Is it an own thread?
 
 - Axel

Nov 07 2001

"Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:

<la7y6nvo shamko.com> wrote in message
news:s7c8zdiubo6.fsf michael.shamko.com...

 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

 There are two main possibilities:  one, same as taking the address of
 something on the stack (which is to say, a runtime error in the best
 case, and random memory accesses in the more typical case);  two, if
 these variables are allocated so that they can outlive the function
 activation (and are collected by gc), the behavior is just the same as
 the case where the function activation has not exited.

I wonder how can this be made technically? Since locals are on stack,
returning from the function makes them invalid. Any other mechanism
for storing them (like, say, a separate RTL-maintained stack) would
be waay slower.

Nov 07 2001

"Robert W. Cunningham" <rwc_2001 yahoo.com> writes:

Pavel \"EvilOne\" Minayev wrote:

 <la7y6nvo shamko.com> wrote in message
 news:s7c8zdiubo6.fsf michael.shamko.com...

 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

 There are two main possibilities:  one, same as taking the address of
 something on the stack (which is to say, a runtime error in the best
 case, and random memory accesses in the more typical case);  two, if
 these variables are allocated so that they can outlive the function
 activation (and are collected by gc), the behavior is just the same as
 the case where the function activation has not exited.

 I wonder how can this be made technically? Since locals are on stack,
 returning from the function makes them invalid. Any other mechanism
 for storing them (like, say, a separate RTL-maintained stack) would
 be waay slower.

Ok, I'll take a stab at it, but in a decidedly non-thread-safe and non-OO
way...

char *utoa( unsigned int i )
{
   // An unsigned int can be 128 bits or more on some systems.  Allow for
it, and for the trailing null
  static char buf[1 + sizeof(unsigned int) * 3];
  char *head, *tail, temp;

  tail = head = &buf[0];

  // Get the digits in low-to-high order
  while (i > 0)
  {
    *tail++ = '0' + (i%10);
    i /= 10;
  }
  // Append a null and point to the last digit.
  *tail-- = '\0';
 // Place the digits in the right order.
  while (head < tail)
  {
    temp = *head;
    *head++ = *tail;
    *tail-- = temp;
  }
  return &buf[0];
}


Yes, it is almost like one of our favorite functions, itoa(), doing what it
should.  But the static buffer it allocates cannot ever be free()ed or
GC'ed until it can be guaranteed that the function will not be called
again.

How would/could this situation be better handled via closures, instead of
via "static"?  Should local variables ever outlast their scope without an
explicit declaration to permit it (and guarantee it)?

Of course, in D we could always simply change the declaration of buf[] to
something like this:

   char *buf;
   buf = new char[(sizeof(unsigned int) * 3];

And then simply let the GC handle the free() for us in the context of the
caller, right?  Or am I missing something here?


-BobC

Nov 07 2001

la7y6nvo shamko.com writes:

The storage for such local variables would be dynamically allocated
using the regular storage allocator.  Yes, it would be slower than
just incrementing a stack pointer, but not a lot slower than just
a regular function call - storage allocators are pretty good at
running quickly these days.

"Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:

 <la7y6nvo shamko.com> wrote in message
 news:s7c8zdiubo6.fsf michael.shamko.com...
 
 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

 There are two main possibilities:  one, same as taking the address of
 something on the stack (which is to say, a runtime error in the best
 case, and random memory accesses in the more typical case);  two, if
 these variables are allocated so that they can outlive the function
 activation (and are collected by gc), the behavior is just the same as
 the case where the function activation has not exited.

 
 I wonder how can this be made technically? Since locals are on stack,
 returning from the function makes them invalid. Any other mechanism
 for storing them (like, say, a separate RTL-maintained stack) would
 be waay slower.

Nov 07 2001

"Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:

<la7y6nvo shamko.com> wrote in message
news:s7cvggmrkt1.fsf michael.shamko.com...

 The storage for such local variables would be dynamically allocated
 using the regular storage allocator.  Yes, it would be slower than
 just incrementing a stack pointer, but not a lot slower than just
 a regular function call - storage allocators are pretty good at
 running quickly these days.

Everything is not that simple though. Since functions are re-entrant,
you can't just allocate locals on heap - you have to manage a stack
for them yourself (which is probably slower). This is another source
of problems: suppose we have function A called 3 times, so now there's
three copies of its locals on the stack. The last time we called it,
it returned, directly or indirectly, a closure referencing one of
its locals, so we can't clean the stack, at least not completely. SO
WE HAVE ALL THE LOCALS GENERATED BY PREVIOUS FUNCTION CALLS HANGING
THERE IN THE MEMORY WASTING SPACE - without knowing when to release
them (since closure has outlived the function and can be activated
at any point of the program)! Now what if the function is recursive
and calls itself a thousand times?

If all locals are preserved, not only those that are referenced from
closures, as you suggest, than this problem will arise with each and
every function. To make it simpler - since every function has its own
copy of locals, and since it can possibly return a pointer to one of
locals, and since locals should be able to outlive the function, we
cannot clear the stack when leaving the function at all!

...am I missing something?

Nov 07 2001

la7y6nvo shamko.com writes:

Only the heap-allocated locals that are referenced by a closure
that is still accessible need be kept.  Any others would be
cleaned up just like any other unreachable object.  Also any
local variables _not_ referenced by closures can be allocated on
the stack and pruned in the usual way.  No stack frames need
be kept.

Does this answer your question?

"Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:

 <la7y6nvo shamko.com> wrote in message
 news:s7cvggmrkt1.fsf michael.shamko.com...
 
 The storage for such local variables would be dynamically allocated
 using the regular storage allocator.  Yes, it would be slower than
 just incrementing a stack pointer, but not a lot slower than just
 a regular function call - storage allocators are pretty good at
 running quickly these days.

 
 Everything is not that simple though. Since functions are re-entrant,
 you can't just allocate locals on heap - you have to manage a stack
 for them yourself (which is probably slower). This is another source
 of problems: suppose we have function A called 3 times, so now there's
 three copies of its locals on the stack. The last time we called it,
 it returned, directly or indirectly, a closure referencing one of
 its locals, so we can't clean the stack, at least not completely. SO
 WE HAVE ALL THE LOCALS GENERATED BY PREVIOUS FUNCTION CALLS HANGING
 THERE IN THE MEMORY WASTING SPACE - without knowing when to release
 them (since closure has outlived the function and can be activated
 at any point of the program)! Now what if the function is recursive
 and calls itself a thousand times?
 
 If all locals are preserved, not only those that are referenced from
 closures, as you suggest, than this problem will arise with each and
 every function. To make it simpler - since every function has its own
 copy of locals, and since it can possibly return a pointer to one of
 locals, and since locals should be able to outlive the function, we
 cannot clear the stack when leaving the function at all!
 
 ...am I missing something?

Nov 08 2001

"Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:

<la7y6nvo shamko.com> wrote in message
news:s7csnbpsmx5.fsf michael.shamko.com...

 Only the heap-allocated locals that are referenced by a closure
 that is still accessible need be kept.  Any others would be
 cleaned up just like any other unreachable object.  Also any
 local variables _not_ referenced by closures can be allocated on
 the stack and pruned in the usual way.  No stack frames need
 be kept.

Does this mean that all locals that are referenced by a closure
automatically become static (since their only instance reside on heap)?
Or will there be a separate stack for them (again, on heap)?

BTW another problem: compiler can't always be sure that closure
gets returned from the function on a specific call - if, for
example, it's in conditional statement. If closure references,
say, a local string, and function is called several times, but
actually returns closure only once, all instances of that local
string are preserved till the end of the program...

Nov 08 2001

a <a b.c> writes:

Pavel \"EvilOne\" Minayev wrote:
 
 <la7y6nvo shamko.com> wrote in message
 news:s7csnbpsmx5.fsf michael.shamko.com...
 
 Only the heap-allocated locals that are referenced by a closure
 that is still accessible need be kept.  Any others would be
 cleaned up just like any other unreachable object.  Also any
 local variables _not_ referenced by closures can be allocated on
 the stack and pruned in the usual way.  No stack frames need
 be kept.

 
 Does this mean that all locals that are referenced by a closure
 automatically become static (since their only instance reside on heap)?
 Or will there be a separate stack for them (again, on heap)?

They would be in the heap I would presume.  Once the instance of the
closure the references them goes away (is no longer referenced itself)
these variables would be disposed of by GC.  The variable could probably
be allocated and deallocated as a group.
 
 BTW another problem: compiler can't always be sure that closure
 gets returned from the function on a specific call - if, for
 example, it's in conditional statement. If closure references,
 say, a local string, and function is called several times, but
 actually returns closure only once, all instances of that local
 string are preserved till the end of the program...

No.  The closure is effectively a reference to a chunk of code and the
chunk of data.  When all references to it go away, the chunk of data
(and all references to said string) are destroyed by the GC.  It really
isn't that complex to follow.  The complex part would be how the
compiler decides what goes on the stack (not referenced by any closure)
what goes into heap allocated closure blocks.  I bet this could get
harry if you declare several different closures in a given scope.

	int f(){
		int a, b, c;
		int F1() = []{print (a, b);};
		while(elsewhere()){
			int a, d;
			F1 = [] {print(a, b, d);
		}
		int d;
		F1 = []{print(a, b, d);
	}

The easy way would be to allocate each shared variable separately, but
that could be very sub-optimal.  Optimization would be possible but they
would probably be a great source of implementation errors.  For what
it's worth, I would love to see anonymous functions in some capacity
even if we don't get full fledged closures.  The closer to closures the
more useful they will be though.

Dan

Nov 08 2001

la7y6nvo shamko.com writes:

a <a b.c> writes:

 The closure is effectively a reference to a chunk of code and the
 chunk of data.  When all references to it go away, the chunk of data
 (and all references to said string) are destroyed by the GC.  It really
 isn't that complex to follow.  The complex part would be how the
 compiler decides what goes on the stack (not referenced by any closure)
 [and] what goes into heap allocated closure blocks.  I bet this could 
 get harry if you declare several different closures in a given scope.

The rule is simple - any variable that is accessed by any closure
would go in the (single) environment object for that function
activation.  In cases like the function below the storage
for the variables in the nested (while) block would be pulled
up into the containing storage area.  This approach is well
understood and used in compilers all the time.  Yes?


 	int f(){
 		int a, b, c;
 		int F1() = []{print (a, b);};
 		while(elsewhere()){
 			int a, d;
 			F1 = [] {print(a, b, d);
 		}
 		int d;
 		F1 = []{print(a, b, d);
 	}

Nov 09 2001

"Sean L. Palmer" <spalmer iname.com> writes:

 Issue one:  Is it useful for closures to have access to local
 variables?  (The access referred to is real shared access, not access
 by value or some sort of IN/OUT or value/result semantics.)

 My response is, it is useful, and here's why.  First, it comes up in
 real use, not on every closure certainly, but often enough so that
 there needs to be some way of sharing local state.  Second, closures
 tend to be short, so using some other mechanism that adds code would
 make a substantial difference.  Third, natural expression - using
 local variables inside closures looks natural and also produces
 expectable results, much the same way as the functions (methods)
 inside a class affect the instance variables of the object that's an
 instance of the class.

I agree, they should be able to access local variables of the enclosing
function.  I don't really think a closure should be able to declare any
variables which are visible to the enclosing function. (much like local
functions)

 Issue two:  How should closures behave when they access local
 variables whose containing function activation has exited?

 There are two main possibilities:  one, same as taking the address of
 something on the stack (which is to say, a runtime error in the best
 case, and random memory accesses in the more typical case);  two, if
 these variables are allocated so that they can outlive the function
 activation (and are collected by gc), the behavior is just the same as
 the case where the function activation has not exited.

 People who program in C or C++ will probably expect (one).  In spite
 of this I claim (two) is better, for the following reasons.

 First, D is moving in the direction of more benign semantics, for
 example garbage collection and array bounds checking.  So providing
 behavior (two) is consistent with where D is going.

 Second, people who are expecting (two) and get (one) will have to deal
 with what is probably a nasty bug.  On the other hand, people who are
 expecting (one) and get (two) will be pleasantly surprised that the
 "anomalous" condition is handled in a way that works.  It's nicer
 to be pleasantly surprised than unpleasantly surprised.

 Third, the rule for how closures work is now very simple - they work
 the same whether the surrounding function activation has exited or
 not.  Simplifying the programming model means simplifying programming.

I almost disagreed with you, but after rethinking I think you're right.
Closures would be much less powerful without being able to outlive the
enclosing function.  If only there were some way of determining whether a
closure was safe to let use the normal stack frame, or if it needed one
allocated from the heap!  I can't think of a good way without somehow having
to declare closure-type parameters to a function as "storable" in much the
way that "const" works in C++.  Perhaps the compiler can keep a record for
each function whether its closure parameters are being treated in an unsafe
storable manner or not, and generate the heap stack frame only in those
cases.  "storable" use being storing the closure argument to any location
external to the function or sending the closure to any function which takes
a storable closure as parameter.  Perhaps "callback" would be a better word
for this.  Anyway if the symbol table is descriptive enough this could be
done in most cases with no heap overhead so long as the closure is used
carefully to satisfy these criteria.

Anyway I don't think a small heap allocation before iterating through 5000
objects will slow an application appreciably.  It is a tradeoff I'd gladly
make in most situations where closures are useful.  The size is likely
small, the lifetime short and, unless stored, predictable.  The compiler
could very well use a reference counted pointer to maintain this memory
instead of GC if it wanted.

Question:  How would a closure declared in a method of an object be handled?
I assume a "this" pointer to the object would be maintained in the closure's
stack frame.

Nov 08 2001

la7y6nvo shamko.com writes:

 Question:  How would a closure declared in a method of an object be handled?
 I assume a "this" pointer to the object would be maintained in the closure's
 stack frame.

Right, except I think you mean the closure's environment object.
The stack frame is created only when the closure is actually
invoked.


 I don't really think a closure should be able to declare any
 variables which are visible to the enclosing function. (much like local
 functions)

Complete agreement.


 If only there were some way of determining whether a
 closure was safe to let use the normal stack frame, or if it needed one
 allocated from the heap!  [remainder omitted].

I have to admit I didn't completely understand the suggestion.
In time something like this might be important to implement.
At first I think it's ignorable as a second-order performance
issue.


 Anyway I don't think a small heap allocation before iterating through 5000
 objects will slow an application appreciably.  It is a tradeoff I'd gladly
 make in most situations where closures are useful.  

Right, that's the point I was trying to make above.

Nov 09 2001

D Programming

C/C++ Programming

Other

D - Closures Part II - local variables outliving function activations