www.digitalmars.com         C & C++   DMDScript  

D - Closures in D

reply la7y6nvo shamko.com writes:
I'd like to talk about the idea of having closures in D.  For those
who aren't used to the term closure, a closure is something like a
pointer to function, and usually is created by a syntactic construct
that appears in source code in the form of an anonymous "code
literal".


Closures in Smalltalk
=====================

I'll start by talking about closures in Smalltalk, which calls them
"Blocks".  (I'm assuming people can read Smalltalk syntax, which may
be a bad assumption, but that's what I'm assuming.)

A block in Smalltalk is code that appears between square brackets.
An example would be

    symbolTable at: identifier ifAbsent: [ defaultSymbol ].

The '[ defaultSymbol ]' is a block that is executed only if the
'identifier' key is not present in 'symbolTable'.

This example isn't very motivating, because defaultSymbol is just
a value.  Consider this however

    symbolTable at: identifier ifAbsent: [ UndeclaredSymbol new ].

The '[ UndeclaredSymbol new ]' block only gets executed if the
identifier is not present.  This could be useful because making a
new UndeclaredSymbol might involve lots of computation.

The examples above return values, but blocks can also have side
effects.  For example,

    symbolTable at: identifier ifAbsent: [
	undeclaredIdentifiers add: identifier.
	UndeclaredSymbol new
    ].

Notice that the last expression evaluated in a block is what is
returned for the value of the block.  (Please, no flames about
bracket placement style - I know some people prefer the open
bracket to go on a line by itself.)

Blocks can also take arguments, for example -

    symbolTable keysAndValuesDo: [  :identifier  :entry  |
        output nextPut:
	    '(', identifier printString, '->', entry printString, ')'.
    ].

which causes a printable form of a symbol table to be added to
the 'output' object.

Blocks get much of their power from being able to access local
context.  In an example above, the block contains references to
'identifier' and 'undeclaredIdentifiers', which might be instance
variables or perhaps local variables.  This ability puts blocks on an
equal footing with the more usual if/then/else and looping control
structures.  (In fact these "built-in" control structures are
themselves implemented using blocks in Smalltalk, but that point is
incidental - the main thing is that having access to local context
gives an important kind of expressive power.)

Incidentally, blocks can declare variables local to the block.
(I'm not going to show an example for this.)

Blocks also have a capability that might be surprising to people
used to C, namely the ability to return (a value) from the method
that contains the block.  So for example

    referencesTo: identifier
        "Answer a collection of all references to identifier"

	| entry |  "this is a local variable"

	entry := self at: identifier ifAbsent: [ ^ Set new ].
	^ programSource referencesToEntry: entry

The ^ in Smalltalk is like 'return' in C.  The block '[ ^ Set new ]'
will, if executed, immediately return a value from the 'referencesTo:'
method, and the following statement will not be executed at all.

Here again, the ability to return a value from the enclosing function
is one that adds expressive power, and puts blocks on an equal footing
with built-in control structures.

Blocks in Smalltalk are fairly representative of closures that are
available in other programming languages, taking into account that
Smalltalk is more of an imperative language than a functional language
(in the sense of "functional programming").


Closures in D
=============

For the sake of discussion let's assume that closures will be included
in D.  What are the implications of this?  Clearly there are syntax
questions (about which more later), and there are questions about what
capabilities should be supported.  For the moment let's use

    [ ParameterList ]{ FunctionBody }

as the syntax for a closure in D, without being too particular about
types or return values, and look at semantic issues.

First, the body of a closure is the same as that for a function body,
which means closures can declare variables local to the closure
activation.

Second, the code in the body of the closure should have access to the
context of the surrounding code, notably instance variables and local
variables.  Instance variables aren't a problem (just have the closure
keep a pointer to the object in question), but local variables are a
little trickier, since a closure can outlive the stack frame of the
method that created it.  This could be handled by taking any variables
used in closure bodies and allocating them in a heap object and having
the closure hang on to a pointer to that.  Yes, there is a cost
associated with doing that allocation, but only for the cases where
there are closures that reference local variables.

[Note added during editing - People who are used to stack-based local
variable frames may be surprised by the idea that local variables
might outlive the function activation.  Rather than get into that
in too much detail here I think a follow-up post is a better idea.
Expect that shortly.]

Third, recursion - as with ordinary functions, closures need a
stack frame for their local variables so recursion can occur.
A closure activation is not the same as a closure.

Fourth, nested closures.  Closure bodies can contain other closure
bodies.  This brings with it all the usual machinery needed for
nested functions, in particular accessing local variables of
containing closures (when that happens).  As with functions,
closures that have local variables accessed by contained closures
would allocate those local variables in an object in the heap.

Fifth, interchangeability with function pointers.  Getting a closure
to masquerade as a function pointer is a little tricky (because a
function pointer is only one pointer, whereas a closure is two
pointers - a pointer to code and a pointer to context that the code
runs in).  Maybe it's useful to do that, I don't know.  But going the
other way is easy - turning a function pointer into a closure just
means adding a null pointer for execution context, and perhaps a dummy
closure body that simply invokes through the function pointer.  The
more closures and function pointers are interchangeable (as far as the
function getting them as arguments is concerned), the more the
language follows the "Law of Least Astonishment".

Incidentally, closures being interchangeable with function pointers
means closures should have all the same parameter passing modes.
In particular, if functions have OUT and INOUT parameters, then
closures should have them.

Sixth, invoking a closure.  How about the same syntax as that
for invoking through a function pointer?

    (*closure_parameter)( argument, argument, argument, ... )

Seventh, closure returns.  It's important to distinguish between
returning from a closure and returning from a function body containing
a closure.  Blocks in Smalltalk use the usual return syntax (^) to
indicate return from the containing method, and use the last
expression evaluated as the return value for the block.  Should there
be a way to return a value from a closure out of the middle of the
closure body?  Should 'return' mean return from the closure or return
from the function?  I can see arguments on both sides.

For nested closures, you can imagine an "intermediate" return that
returns from a containing closure but not the outermost function body.
Blocks in Smalltalk don't include this capability (innermost return or
outermost return are the only choices);  I have no experience with it,
so I don't know how useful it is.  But if D is to include multi-level
breaks, it probably should also include multi-level returns.

Eighth, other control structures.  Consider the following code 
fragment -

    while(  p < p_limit  ){
	candidate = *p;
	candidate.enumerate_parts( 
	    [ part ]{
	        if(  part.passesTest()  )  break;
	    }
	);
	p ++;
    }

Notice what's happening.  The closure is testing a value, and in the
event that the test succeeds, the 'break' breaks out of the while in
the function body outside the closure - a non-local transfer of
control.  There are similar examples for 'continue' and (dare I say
it) 'goto'.  These control structures aren't that different from
uplevel returns in terms of what mechanisms are needed.

Notice that any non-local control transfer - either a return out of or
a break/continue/goto into - a function body containing a closure can
fail at runtime if that function activation has already returned.
This is true for intermediate (closure) function bodies as well as
outermost function bodies.  Presumably such a case would generate
some kind of exception.

Despite the difficulties of these exceptional conditions (and the
reactions I'm sure some people will have at the idea of using 'goto'
out of a closure), I would argue that if the language is going to be
orthogonal then as much as possible closures should look like the
regular built-in control structures and non-local control transfers
should be supported.


Syntax for Closures
===================

For invoking a closure - how about using the pointer to function
notation

    (*closure)( argument, argument, argument, ... )

Note:  personally I see nothing wrong with using the unadorned
function call notation for pointers-to-function

    function_pointer( argument, argument, argument, ... )

and would use that for closures if it were allowed for function
pointers, but I don't want to start an argument on that topic.

For declaring a closure variable - this should be similar (not
identical) to declaring a function pointer.  How will function
pointers be declared in D?

For writing a closure body - well I like the syntax I've been
using, assuming it doesn't cause too many problems:

    [  ParameterList  ]  {  FunctionBody  }

The types of the parameters and the return value need to go somewhere,
which is sort of a pain.  The parameter types might go in the 
parameter list, much like function definitions, e.g.,

    [ int a, int b ]  {  a < b ? a : b  }

but for the return type, I'm at more of a loss.  Perhaps other people
will have some suggestions.


Final Remarks
=============

I think closures should be considered for D.  I've tried to lay out
the basic issues and implications of D having closures.  I can say
based on experience that closures are extremely useful - given a
choice between closures and exception handling, I think closures
are more generally useful.  Also, I think having closures in some
form is a prerequisite to writing a good set of Collection classes,
which is a large part of the motivation for having some form of
generics (or templates).
Nov 07 2001
next sibling parent reply Axel Kittenberger <axel dtone.org> writes:
Quite a lot of text, and honestly it only confused me. 

What's now the advantage of a closure? How is it implemented? Especially in 
featureless systems where C originally was developed for. In UserSpace 
resources are easy to get/implement, but what in kernel space? In embedded 
OS-less applications? This places are the real motherhood of C. 

Maybe more for a presentational hint, try to get faster to the point, and 
explain it good and in a clean easy to understand way. This text sounds 
just so universitary from professors who's thing they most love in life is 
to hear themself's talking.

- Axel
Nov 07 2001
parent reply la7y6nvo shamko.com writes:
Here's an example of where a closure might be useful:

    quicksort( 
	string_pointer_array, 
	number_of_elements,
	[ char * a, char * b ]{  strcmp( a, b )  }
    );

So we get to write the compare function right here rather than
having to define a separate function.

Closures are also useful for dealing with collections:

    some_collection.do( 
	[ ElementType element ]{
	    if(  element.satisfies_some_condition()  )  return  element;
	}
    );
    return  NULL;

This search pattern occurs frequently, so we might very well have
a function that does it already, in which case we could write

    ElementType e = 
	some_collection.first_element_that(
	    [ ElementType e ]{  e.satisfies_some_condition()  }
	);

Closures can also be used to create control structures that haven't
(yet) gotten into the language:

    foreach_element_pair( array_a, array_b, 
	[ ElementType a_element, ElementType b_element ]{
	    /* do something with a_element and b_element */
	}
    );

Do these examples help?

Implementation - closures don't need interpretation or any sort
of runtime compilation;  they are compiled in the same way that
functions are.  The non-local control transfers need some sort
of mechanism similar to an exception handling mechanism - I
expect Walter can provide some details here.


To respond to the individual points -

 Quite a lot of text, and honestly it only confused me. 

Sorry for the confusion. There was a lot of ground to cover, and I thought breaking it into small pieces would help the presentation.
 What's now the advantage of a closure? How is it implemented? Especially in 
 featureless systems where C originally was developed for. In UserSpace 
 resources are easy to get/implement, but what in kernel space? In embedded 
 OS-less applications? This places are the real motherhood of C. 

Compiling a closure is much like compiling a function. The cost of calling a closure is about the same as calling through a pointer to function; the major difference is that a closure needs an extra pointer to establish its execution context. Functional closures like the quicksort example above have essentially no more overhead than calling through pointer to function. Closures that need access to local context require one heap allocation on function entry (regardless of how many closures in the function there are). I tried to talk about some advantages up above.
 Maybe more for a presentational hint, try to get faster to the point, and 
 explain it good and in a clean easy to understand way. This text sounds 
 just so universitary from professors who's thing they most love in life is 
 to hear themself's talking.

I appreciate the advice.
Nov 07 2001
parent reply "Roberto Mariottini" <rmariottini lycosmail.com> writes:
<la7y6nvo shamko.com> wrote in message
news:s7c4ro6ta1a.fsf michael.shamko.com...
 Here's an example of where a closure might be useful:

 string_pointer_array, 
 number_of_elements,
 [ char * a, char * b ]{  strcmp( a, b )  }
     );
 
 So we get to write the compare function right here rather than
 having to define a separate function.
 
 Closures are also useful for dealing with collections:
 
     some_collection.do( 
 [ ElementType element ]{
     if(  element.satisfies_some_condition()  )  return  element;
 }
     );
     return  NULL;
 
 This search pattern occurs frequently, so we might very well have
 a function that does it already, in which case we could write
 
     ElementType e = 
 some_collection.first_element_that(
     [ ElementType e ]{  e.satisfies_some_condition()  }
 );
 
 Closures can also be used to create control structures that haven't
 (yet) gotten into the language:
 
     foreach_element_pair( array_a, array_b, 
 [ ElementType a_element, ElementType b_element ]{
     /* do something with a_element and b_element */
 }
     );
 
 Do these examples help?

If I understand correctly, the only thing similar to Closures that I've seen are Java inner (and anonymous) classes. Does inner classes support resolve the problem? Ciao
Nov 09 2001
next sibling parent reply la7y6nvo shamko.com writes:
"Roberto Mariottini" <rmariottini lycosmail.com> writes:

 If I understand correctly, the only thing similar to Closures that I've seen
 are Java inner (and anonymous) classes.
 
 Does inner classes support resolve the problem?

Nested classes (assuming that's what Java inner classes are) seem like a related but distinct mechanism. An important aspect of closures is being able to write them right where you need them rather than defining them elsewhere. The experience I have with nested classes (which experience is in C++) is that nested classes don't provide this capability, which is a significant lack. Beyond that, it's possible that some sort of nested class facility could provide a semantically equivalent mechanism. I'm not aware of any however.
Nov 09 2001
parent reply "Roberto Mariottini" <rmariottini lycosmail.com> writes:
<la7y6nvo shamko.com> wrote innews:s7cadxvl82v.fsf michael.shamko.com...
 "Roberto Mariottini" <rmariottini lycosmail.com> writes:

 If I understand correctly, the only thing similar to Closures that I've


 are Java inner (and anonymous) classes.

 Does inner classes support resolve the problem?

Nested classes (assuming that's what Java inner classes are) seem like a related but distinct mechanism. An important aspect of closures is being able to write them right where you need them rather than defining them elsewhere. The experience I have with nested classes (which experience is in C++) is that nested classes don't provide this capability, which is a significant lack.

Agreed. But inner classes are not simply nested classes. A nested class is what in Java is called "static inner class", while non-static inner classes behave much like closures. Ciao
Nov 12 2001
parent la7y6nvo shamko.com writes:
The message doesn't say what either "static inner classes" or
"non-static inner classes" are.

After doing a bit of research on the web, I found these pages:

    http://java.sun.com/products/jdk/1.1/docs/guide/innerclasses/spec/innerclasses.doc1.html
    http://java.sun.com/products/jdk/1.1/docs/guide/innerclasses/spec/innerclasses.doc2.html

The phrase "nested classes" as I used the term was intended to include
both what Java calls static inner classes and (non-static) inner
classes.  Java allows anonymous inner classes which can be written
in expressions.

Java inner classes provide some capabilities that closures do not:

   1)  Multiple instances can be made;

   2)  They have encapsulated state (the inner class's instance
       variables);

   3)  They can have more than one "entry point" (method).


Closures are less general in some ways but also offer some advantages:

   1)  Closures offer a more compact notation for what they do;

   2)  Closures allow non-local control transfers (which as near as
       I can tell inner classes in Java do not).


So I don't see either closures or inner classes as providing all
the capabilities of the other.  Like the earlier message said,
they are related but distinct mechanisms.

Perhaps inner classes are useful enough to merit inclusion.  That's
a big topic for discussion.

The advantages listed for closures argue for closures being considered
for inclusion in D whether inner classes are included or not.

That all make sense?


"Roberto Mariottini" <rmariottini lycosmail.com> writes:

 <la7y6nvo shamko.com> wrote innews:s7cadxvl82v.fsf michael.shamko.com...
 "Roberto Mariottini" <rmariottini lycosmail.com> writes:

 If I understand correctly, the only thing similar to Closures that I've


 are Java inner (and anonymous) classes.

 Does inner classes support resolve the problem?

Nested classes (assuming that's what Java inner classes are) seem like a related but distinct mechanism. An important aspect of closures is being able to write them right where you need them rather than defining them elsewhere. The experience I have with nested classes (which experience is in C++) is that nested classes don't provide this capability, which is a significant lack.

Agreed. But inner classes are not simply nested classes. A nested class is what in Java is called "static inner class", while non-static inner classes behave much like closures. Ciao

Nov 12 2001
prev sibling parent Axel Kittenberger <axel dtone.org> writes:
 If I understand correctly, the only thing similar to Closures that I've
 seen are Java inner (and anonymous) classes.
 
 Does inner classes support resolve the problem?

What an insight! Yes thinking about it inner classes are the same thing in java, only with another skin. (They running frame is created upon execution, it can outlive it's father routine, you can have several of them at the same time etc... ) So I used already something like closures without knowing it :o) - Axel -- |D) http://www.dtone.org
Nov 09 2001
prev sibling next sibling parent reply "Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:
<la7y6nvo shamko.com> wrote in message
news:s7cbsieubtt.fsf michael.shamko.com...

 form is a prerequisite to writing a good set of Collection classes,
 which is a large part of the motivation for having some form of
 generics (or templates).

First, I don't understand the need of closures for Collections (not that I understand them - closures - very well at all =)). But since D has dynamic and associative arrays, I think most requirements for containers are covered.
Nov 07 2001
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
Dynamic and associative arrays barely scratch the surface of container
types.

Consider a video game, which maintains its world database as an octree of
objects of various types.  Would it not be nice to be able to write code
something like this:?

octreeScene.ForEach( [ octreeNode node ] { if ( node.IsAlive() &&
camera.CanSee(node.boundSphere) ) node.Render(); } );

This is alot of expressive power... something that may help D out alot.  I
really needs something to make up for lack of generics.

The alternative is a fairly clunky functor object.  This is almost a way to
get the compiler to write functor objects for you... I have a feeling they'd
be pretty useful for callbacks as well.

Sean

"Pavel "EvilOne" Minayev" <evilone omen.ru> wrote in message
news:9sb9jv$ik3$1 digitaldaemon.com...
 <la7y6nvo shamko.com> wrote in message
 news:s7cbsieubtt.fsf michael.shamko.com...

 form is a prerequisite to writing a good set of Collection classes,
 which is a large part of the motivation for having some form of
 generics (or templates).

First, I don't understand the need of closures for Collections (not that I understand them - closures - very well at all =)). But since D has dynamic and associative arrays, I think most requirements for containers are covered.

Nov 08 2001
parent "Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:
"Sean L. Palmer" <spalmer iname.com> wrote in message
news:9sdl9d$28id$1 digitaldaemon.com...
 Dynamic and associative arrays barely scratch the surface of container
 types.

 Consider a video game, which maintains its world database as an octree of
 objects of various types.  Would it not be nice to be able to write code
 something like this:?

 octreeScene.ForEach( [ octreeNode node ] { if ( node.IsAlive() &&
 camera.CanSee(node.boundSphere) ) node.Render(); } );

 This is alot of expressive power... something that may help D out alot.  I
 really needs something to make up for lack of generics.

for-each loops is something that I was going to offer and that, IMHO, must be in the language that supports dynamic and associative arrays. So, this example could be rewritten as (note, this is also a syntax proposal): for(node: octreeNode) { if (node.IsAlive() && camera.CanSee(node.boundSphere)) node.Render(); }
Nov 08 2001
prev sibling next sibling parent reply "Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:
<la7y6nvo shamko.com> wrote in message
news:s7cbsieubtt.fsf michael.shamko.com...

 I'd like to talk about the idea of having closures in D.

In fact, after reading this one more time, I really can't understand HOW can this be implemented in a non-interpreted language like D. Nor do I see any advantages of it... Maybe I'm wrong. I'm not sure that I understood anything right, besides, I'm not familiar with Smalltalk, so could you please give a shorter and more self-explanatory example? =)
Nov 07 2001
parent reply Russell Borogove <kaleja estarcion.com> writes:
Pavel \"EvilOne\" Minayev wrote:
 
 <la7y6nvo shamko.com> wrote in message
 news:s7cbsieubtt.fsf michael.shamko.com...
 
 I'd like to talk about the idea of having closures in D.

In fact, after reading this one more time, I really can't understand HOW can this be implemented in a non-interpreted language like D. Nor do I see any advantages of it...

Closures can be used as a form of generic programming, but in a slightly different way than C++ templates. Perl has them and makes them very convenient to use; Perl has the advantage that the run-time environment is guaranteed to include a Perl compiler[1]. Here's a trivial example of a closure in Perl: for my $color (qw[red yellow orange green blue purple violet]) { *$color = sub { qq<<FONT COLOR="\U$color\E"> _</FONT>> }; } Don't worry about the hairballs, that's just Perl. :) This automatically generates functions/subroutines red(), yellow(), orange()... etc., which generate some HTML to render strings in the specified color. In C/C++, you could approach this with #defined macros (except for the trick where function red() generates the color tag "RED"), but closures offer quite a bit more power than this. The above example in and of itself could be evaluated completely at compile time, but you could hypothetically replace the list of color names above with a list of arbitrary dynamic text strings. At this point, you _have_ to have dynamic binding and a compiler in the runtime. My guess is that this need is incompatible with the desire to have D be easy to implement. -RB [1] Inasmuch as perl is a compiled language.
Nov 07 2001
next sibling parent "Pavel \"EvilOne\" Minayev" <evilone omen.ru> writes:
"Russell Borogove" <kaleja estarcion.com> wrote in message
news:3BE98692.7994A03C estarcion.com...

 The above example in and of itself could be evaluated completely at
 compile time, but you could hypothetically replace the list of color
 names above with a list of arbitrary dynamic text strings. At this
 point, you _have_ to have dynamic binding and a compiler in the
 runtime. My guess is that this need is incompatible with the desire
 to have D be easy to implement.

Thanks for explaining, now everything seems much clearer. And I don't think that it's worth implementing. As you've said it requires dynamic binding, which would probably bloat the code... no thanks. Just not the "D way", I suppose - or at least it feels like that.
Nov 07 2001
prev sibling parent la7y6nvo shamko.com writes:
The Perl example is generating functions dynamically at runtime.

Closures are more static, similar to nested functions.  In fact
the main difference between a nested function and a closure is
that closures are written in line at the point of use.

Imagine a C-like language with nested functions, eg

    int
    outer_function(){
	int
	inner_function(){
	    /* inner function body ... */
	}

	needs_pointer_to_function( inner_function );
    }

A closure is like a function except the code is written directly where
it's needed:

    int
    outer_function(){

	needs_pointer_to_function( []{ /* inner function body ... */ } );
    }

The Perl example (as I understand it - I'm not a Perl expert) is
generating function names (and I guess function bodies) dynamically
and hooking those functions into the environment.

Closures don't do any dynamic generation.  The code is compiled
just like a function body, and the value of the closure is stored
in an ordinary variable, like pointer-to-function, except closures
have two pointers - one for the code body, one for the execution
environment - rather than just one.



Russell Borogove <kaleja estarcion.com> writes:

 Pavel \"EvilOne\" Minayev wrote:
 
 <la7y6nvo shamko.com> wrote in message
 news:s7cbsieubtt.fsf michael.shamko.com...
 
 I'd like to talk about the idea of having closures in D.

In fact, after reading this one more time, I really can't understand HOW can this be implemented in a non-interpreted language like D. Nor do I see any advantages of it...

Closures can be used as a form of generic programming, but in a slightly different way than C++ templates. Perl has them and makes them very convenient to use; Perl has the advantage that the run-time environment is guaranteed to include a Perl compiler[1]. Here's a trivial example of a closure in Perl: for my $color (qw[red yellow orange green blue purple violet]) { *$color = sub { qq<<FONT COLOR="\U$color\E"> _</FONT>> }; } Don't worry about the hairballs, that's just Perl. :) This automatically generates functions/subroutines red(), yellow(), orange()... etc., which generate some HTML to render strings in the specified color. In C/C++, you could approach this with #defined macros (except for the trick where function red() generates the color tag "RED"), but closures offer quite a bit more power than this. The above example in and of itself could be evaluated completely at compile time, but you could hypothetically replace the list of color names above with a list of arbitrary dynamic text strings. At this point, you _have_ to have dynamic binding and a compiler in the runtime. My guess is that this need is incompatible with the desire to have D be easy to implement. -RB [1] Inasmuch as perl is a compiled language.

Nov 07 2001
prev sibling parent "Patrick Down" <pdown austin.rr.com> writes:
The following is a paper about adding closures to C++.  Unfortunately
there method does not seem to support closures that persist beyond
the lifetime of the activation record of the enclosing function.

http://people.debian.org/~karlheg/Usenix88-lexic.pdf

<la7y6nvo shamko.com> wrote in message
news:s7cbsieubtt.fsf michael.shamko.com...
 I'd like to talk about the idea of having closures in D.  For those
 who aren't used to the term closure, a closure is something like a
 pointer to function, and usually is created by a syntactic construct
 that appears in source code in the form of an anonymous "code
 literal".


 Closures in Smalltalk
 =====================

 I'll start by talking about closures in Smalltalk, which calls them
 "Blocks".  (I'm assuming people can read Smalltalk syntax, which may
 be a bad assumption, but that's what I'm assuming.)

 A block in Smalltalk is code that appears between square brackets.
 An example would be

     symbolTable at: identifier ifAbsent: [ defaultSymbol ].

 The '[ defaultSymbol ]' is a block that is executed only if the
 'identifier' key is not present in 'symbolTable'.

 This example isn't very motivating, because defaultSymbol is just
 a value.  Consider this however

     symbolTable at: identifier ifAbsent: [ UndeclaredSymbol new ].

 The '[ UndeclaredSymbol new ]' block only gets executed if the
 identifier is not present.  This could be useful because making a
 new UndeclaredSymbol might involve lots of computation.

 The examples above return values, but blocks can also have side
 effects.  For example,

     symbolTable at: identifier ifAbsent: [
 undeclaredIdentifiers add: identifier.
 UndeclaredSymbol new
     ].

 Notice that the last expression evaluated in a block is what is
 returned for the value of the block.  (Please, no flames about
 bracket placement style - I know some people prefer the open
 bracket to go on a line by itself.)

 Blocks can also take arguments, for example -

     symbolTable keysAndValuesDo: [  :identifier  :entry  |
         output nextPut:
     '(', identifier printString, '->', entry printString, ')'.
     ].

 which causes a printable form of a symbol table to be added to
 the 'output' object.

 Blocks get much of their power from being able to access local
 context.  In an example above, the block contains references to
 'identifier' and 'undeclaredIdentifiers', which might be instance
 variables or perhaps local variables.  This ability puts blocks on an
 equal footing with the more usual if/then/else and looping control
 structures.  (In fact these "built-in" control structures are
 themselves implemented using blocks in Smalltalk, but that point is
 incidental - the main thing is that having access to local context
 gives an important kind of expressive power.)

 Incidentally, blocks can declare variables local to the block.
 (I'm not going to show an example for this.)

 Blocks also have a capability that might be surprising to people
 used to C, namely the ability to return (a value) from the method
 that contains the block.  So for example

     referencesTo: identifier
         "Answer a collection of all references to identifier"

 | entry |  "this is a local variable"

 entry := self at: identifier ifAbsent: [ ^ Set new ].
 ^ programSource referencesToEntry: entry

 The ^ in Smalltalk is like 'return' in C.  The block '[ ^ Set new ]'
 will, if executed, immediately return a value from the 'referencesTo:'
 method, and the following statement will not be executed at all.

 Here again, the ability to return a value from the enclosing function
 is one that adds expressive power, and puts blocks on an equal footing
 with built-in control structures.

 Blocks in Smalltalk are fairly representative of closures that are
 available in other programming languages, taking into account that
 Smalltalk is more of an imperative language than a functional language
 (in the sense of "functional programming").


 Closures in D
 =============

 For the sake of discussion let's assume that closures will be included
 in D.  What are the implications of this?  Clearly there are syntax
 questions (about which more later), and there are questions about what
 capabilities should be supported.  For the moment let's use

     [ ParameterList ]{ FunctionBody }

 as the syntax for a closure in D, without being too particular about
 types or return values, and look at semantic issues.

 First, the body of a closure is the same as that for a function body,
 which means closures can declare variables local to the closure
 activation.

 Second, the code in the body of the closure should have access to the
 context of the surrounding code, notably instance variables and local
 variables.  Instance variables aren't a problem (just have the closure
 keep a pointer to the object in question), but local variables are a
 little trickier, since a closure can outlive the stack frame of the
 method that created it.  This could be handled by taking any variables
 used in closure bodies and allocating them in a heap object and having
 the closure hang on to a pointer to that.  Yes, there is a cost
 associated with doing that allocation, but only for the cases where
 there are closures that reference local variables.

 [Note added during editing - People who are used to stack-based local
 variable frames may be surprised by the idea that local variables
 might outlive the function activation.  Rather than get into that
 in too much detail here I think a follow-up post is a better idea.
 Expect that shortly.]

 Third, recursion - as with ordinary functions, closures need a
 stack frame for their local variables so recursion can occur.
 A closure activation is not the same as a closure.

 Fourth, nested closures.  Closure bodies can contain other closure
 bodies.  This brings with it all the usual machinery needed for
 nested functions, in particular accessing local variables of
 containing closures (when that happens).  As with functions,
 closures that have local variables accessed by contained closures
 would allocate those local variables in an object in the heap.

 Fifth, interchangeability with function pointers.  Getting a closure
 to masquerade as a function pointer is a little tricky (because a
 function pointer is only one pointer, whereas a closure is two
 pointers - a pointer to code and a pointer to context that the code
 runs in).  Maybe it's useful to do that, I don't know.  But going the
 other way is easy - turning a function pointer into a closure just
 means adding a null pointer for execution context, and perhaps a dummy
 closure body that simply invokes through the function pointer.  The
 more closures and function pointers are interchangeable (as far as the
 function getting them as arguments is concerned), the more the
 language follows the "Law of Least Astonishment".

 Incidentally, closures being interchangeable with function pointers
 means closures should have all the same parameter passing modes.
 In particular, if functions have OUT and INOUT parameters, then
 closures should have them.

 Sixth, invoking a closure.  How about the same syntax as that
 for invoking through a function pointer?

     (*closure_parameter)( argument, argument, argument, ... )

 Seventh, closure returns.  It's important to distinguish between
 returning from a closure and returning from a function body containing
 a closure.  Blocks in Smalltalk use the usual return syntax (^) to
 indicate return from the containing method, and use the last
 expression evaluated as the return value for the block.  Should there
 be a way to return a value from a closure out of the middle of the
 closure body?  Should 'return' mean return from the closure or return
 from the function?  I can see arguments on both sides.

 For nested closures, you can imagine an "intermediate" return that
 returns from a containing closure but not the outermost function body.
 Blocks in Smalltalk don't include this capability (innermost return or
 outermost return are the only choices);  I have no experience with it,
 so I don't know how useful it is.  But if D is to include multi-level
 breaks, it probably should also include multi-level returns.

 Eighth, other control structures.  Consider the following code
 fragment -

     while(  p < p_limit  ){
 candidate = *p;
 candidate.enumerate_parts(
     [ part ]{
         if(  part.passesTest()  )  break;
     }
 );
 p ++;
     }

 Notice what's happening.  The closure is testing a value, and in the
 event that the test succeeds, the 'break' breaks out of the while in
 the function body outside the closure - a non-local transfer of
 control.  There are similar examples for 'continue' and (dare I say
 it) 'goto'.  These control structures aren't that different from
 uplevel returns in terms of what mechanisms are needed.

 Notice that any non-local control transfer - either a return out of or
 a break/continue/goto into - a function body containing a closure can
 fail at runtime if that function activation has already returned.
 This is true for intermediate (closure) function bodies as well as
 outermost function bodies.  Presumably such a case would generate
 some kind of exception.

 Despite the difficulties of these exceptional conditions (and the
 reactions I'm sure some people will have at the idea of using 'goto'
 out of a closure), I would argue that if the language is going to be
 orthogonal then as much as possible closures should look like the
 regular built-in control structures and non-local control transfers
 should be supported.


 Syntax for Closures
 ===================

 For invoking a closure - how about using the pointer to function
 notation

     (*closure)( argument, argument, argument, ... )

 Note:  personally I see nothing wrong with using the unadorned
 function call notation for pointers-to-function

     function_pointer( argument, argument, argument, ... )

 and would use that for closures if it were allowed for function
 pointers, but I don't want to start an argument on that topic.

 For declaring a closure variable - this should be similar (not
 identical) to declaring a function pointer.  How will function
 pointers be declared in D?

 For writing a closure body - well I like the syntax I've been
 using, assuming it doesn't cause too many problems:

     [  ParameterList  ]  {  FunctionBody  }

 The types of the parameters and the return value need to go somewhere,
 which is sort of a pain.  The parameter types might go in the
 parameter list, much like function definitions, e.g.,

     [ int a, int b ]  {  a < b ? a : b  }

 but for the return type, I'm at more of a loss.  Perhaps other people
 will have some suggestions.


 Final Remarks
 =============

 I think closures should be considered for D.  I've tried to lay out
 the basic issues and implications of D having closures.  I can say
 based on experience that closures are extremely useful - given a
 choice between closures and exception handling, I think closures
 are more generally useful.  Also, I think having closures in some
 form is a prerequisite to writing a good set of Collection classes,
 which is a large part of the motivation for having some form of
 generics (or templates).

Nov 09 2001