www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Explicit TCE

reply "Tyler Jameson Little" <beatgammit gmail.com> writes:
I've read a few threads discussing tail call elimination, but 
they all wanted the spec to articulate specific circumstances 
where tail call elimination is required.  Has there been any 
thought to adding syntax to explicitly state tail call 
elimination?

D could use something like Newsqueak's become keyword. If you're 
not familial with Newsqueak, become is just like a return, except 
it replaces the stack frame with the function that it calls.  
This was briefly mentioned years ago in this forum, but the 
become keyword was ignored:

     
http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html

I think D could do something like this:

     int fact(int n, int accum = 1) {
         if (n == 0) {
             return accum
         }
         // become means return, but guarantees TCE
         become fact(n - 1, accum * n)
     }

DMD should optimize this already, but explicitly stating become 
is a key to the compiler that the user wants this call to be 
eliminated.  Then, more interesting things can be implemented 
more simply, like a state machine:

     void stateA() {
         become stateB();
     }

     void stateB() {
         become stateC();
     }

     void stateC() {
         return;
     }

     void main() {
         become stateA();
     }

This would only take a single stack frame. If there were 
conditionals in there for branching, this could end up with a 
stack overflow because DMD does not support complicated TCE, only 
simple recursive TCE.

The become keyword would probably have to have these properties:

     * statement after become must only be a function call (can't 
do foo() + 3)
     * scope() needs to be handled appropriately for functions 
with become

There might be others. I'm not sure how D handles stack sizes, so 
this may be an issue as well if stack sizes are determined at 
runtime. A requirement could be that functions called with become 
need to have a static stack size, since this stack might never be 
collected (in the case of an infinite state machine).

Adding this feature wouldn't cost much, but it would add a ton of 
functional power.
Oct 12 2012
next sibling parent =?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:
On 12-10-2012 19:29, Tyler Jameson Little wrote:
 I've read a few threads discussing tail call elimination, but they all
 wanted the spec to articulate specific circumstances where tail call
 elimination is required.  Has there been any thought to adding syntax to
 explicitly state tail call elimination?

 D could use something like Newsqueak's become keyword. If you're not
 familial with Newsqueak, become is just like a return, except it
 replaces the stack frame with the function that it calls. This was
 briefly mentioned years ago in this forum, but the become keyword was
 ignored:

 http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html


 I think D could do something like this:

      int fact(int n, int accum = 1) {
          if (n == 0) {
              return accum
          }
          // become means return, but guarantees TCE
          become fact(n - 1, accum * n)
      }

 DMD should optimize this already, but explicitly stating become is a key
 to the compiler that the user wants this call to be eliminated.  Then,
 more interesting things can be implemented more simply, like a state
 machine:

      void stateA() {
          become stateB();
      }

      void stateB() {
          become stateC();
      }

      void stateC() {
          return;
      }

      void main() {
          become stateA();
      }

 This would only take a single stack frame. If there were conditionals in
 there for branching, this could end up with a stack overflow because DMD
 does not support complicated TCE, only simple recursive TCE.

 The become keyword would probably have to have these properties:

      * statement after become must only be a function call (can't do
 foo() + 3)
      * scope() needs to be handled appropriately for functions with become

 There might be others. I'm not sure how D handles stack sizes, so this
 may be an issue as well if stack sizes are determined at runtime. A
 requirement could be that functions called with become need to have a
 static stack size, since this stack might never be collected (in the
 case of an infinite state machine).

 Adding this feature wouldn't cost much, but it would add a ton of
 functional power.

I'm a big fan of explicit, guaranteed TCE. However, the primary problem with this approach is a really mundane one: The major compiler back ends (GCC and LLVM) don't have any means of guaranteeing TCE... -- Alex Rønne Petersen alex lycus.org http://lycus.org
Oct 12 2012
prev sibling next sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12-Oct-12 21:29, Tyler Jameson Little wrote:
 I've read a few threads discussing tail call elimination, but they all
 wanted the spec to articulate specific circumstances where tail call
 elimination is required.  Has there been any thought to adding syntax to
 explicitly state tail call elimination?

 D could use something like Newsqueak's become keyword. If you're not
 familial with Newsqueak, become is just like a return, except it
 replaces the stack frame with the function that it calls. This was
 briefly mentioned years ago in this forum, but the become keyword was
 ignored:

I suspect it's so called switch-call or forced tail call. I proposed something to the exact effect of the below but to reuse the 'goto' keyword.
 http://www.digitalmars.com/d/archives/digitalmars/D/Rob_Pike_s_Newsqueak_-_some_good_concepts_53511.html


 I think D could do something like this:

      int fact(int n, int accum = 1) {
          if (n == 0) {
              return accum
          }
          // become means return, but guarantees TCE
          become fact(n - 1, accum * n)
      }

 DMD should optimize this already, but explicitly stating become is a key
 to the compiler that the user wants this call to be eliminated.  Then,
 more interesting things can be implemented more simply, like a state
 machine:

      void stateA() {
          become stateB();
      }

      void stateB() {
          become stateC();
      }

      void stateC() {
          return;
      }

      void main() {
          become stateA();
      }

 This would only take a single stack frame. If there were conditionals in
 there for branching, this could end up with a stack overflow because DMD
 does not support complicated TCE, only simple recursive TCE.

 The become keyword would probably have to have these properties:

      * statement after become must only be a function call (can't do
 foo() + 3)
      * scope() needs to be handled appropriately for functions with become

 There might be others. I'm not sure how D handles stack sizes, so this
 may be an issue as well if stack sizes are determined at runtime. A
 requirement could be that functions called with become need to have a
 static stack size, since this stack might never be collected (in the
 case of an infinite state machine).

 Adding this feature wouldn't cost much, but it would add a ton of
 functional power.

The use case in general is threaded code. In particular fast virtual machines. -- Dmitry Olshansky
Oct 12 2012
parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12-Oct-12 22:17, Tyler Jameson Little wrote:
 No idea what you are talking about.

I'm not sure which part wasn't clear, so I'll try to explain myself. Please don't feel offended if I clarify things you already understand.

 An optimizable tail call must simply be a function call. The current
 stack frame would be replaced with the new function, so anything more
 complex than a simple function call would require some stack from the
 preceding function to stick around in the new function, thus requiring
 the old stack to stick around.

 For example, te following is not optimizable the old stack (the one with
 3) needs to be maintained until foo() returns, which is not TCE.

 return foo() * 3


 Since the old stack won't be around anymore, that leaves us with in a
 sticky situation with regard to scope():

 http://dlang.org/statement.html#ScopeGuardStatement

 If the current stack is going to be replaced with data from another
 function call, the behavior of scope() is undefined. The scope that
 scope() was in has now been repurpose, but the scope is still kind of
 there. If scope() is allowed, they must be executed just before the tail
 call, otherwise it will be overwritten (or it has to stick around until
 the actual stack frame is cleared.

Consider:
 void a() {
    become b();
 }

 void b() {
    // when does this get called?
    scope(exit) writeln("exited");
    become a();
 }

 If we allow scope(), then the line should be written before the call to
 a(). If we don't, then this is a compile time error. I like disallowing
 it personally, because if the scope(exit) call frees some memory that is
 passed to a, the programmer may think that it will be called after a
 exits, which may not be the case.

 void a(void* arr) {
    // do something with arr
    become b();
 }

 void b() {
    void* arr = malloc(sizeof(float) * 16);
    scope(exit) free(arr);
    become a(arr);
 }

 I just see this as being a problem for those who don't fully understand
 scoping and TCE.

 My mention of overhead was just how complicated it would be to
 implement. The general algorithm is (for each become keyword):

 * determine max stack size (consider all branches in all recursive
 contexts)
 * allocate stack size for top-level function
 * do normal TCE stuff (use existing stack for new call)

What's wrong with just allocating a new stack _in-place_ of the old? In other words make 'become' synonym for 'reuse the current stack frame'. Effectively you still stay in constant space that is maximum of all functions being called.
 The stack size should be known at compile time for cases like the one
 above (a calls b, b calls a, infinitely) to avoid infinitely expanding
 stack. A situation like this is a memory optimization, so forcing
 guaranteed stack size puts an upper-bound on memory usage, which is the
 whole point of TCE. If the stack is allowed to grow, there is
 opportunity for stack overflow.

Right.
 My use case for this is a simple compiler, but I'm sure this could be
 applied to other use cases as well.  I'd like to produce code for some
 BNF-style grammar where each LHS is a function. Thus, my state machine
 wouldn't be a huge, unnatural switch statement that reads in the current
 state, but a series of code branches that 'become' other states, like an
 actual state machine.

I see nice staff. My use case is optimizing virtual machine, the one inside std.regex primarily. -- Dmitry Olshansky
Oct 12 2012
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
On 12-Oct-12 22:49, Tyler Jameson Little wrote:
 That would work too. If scope() is disallowed, it doesn't matter where
 the stack comes from. It's only slightly cheaper to reuse the current
 stack (CPU), but making a new one would be lighter on memory.

 I see nice staff. My use case is optimizing virtual machine, the one
 inside std.regex primarily.

Yeah, that is a great example! I've read some bug reports about std.regex using a ton of memory, especially with CTFE.

Hey, that's dmd (compiler) using a ton of memory, not std.regex :( It actually flies with only a modest set of ram after CTFE (or rather 'if') succeeds that is :) Since regex is by
 definition a state machine, this would be a particularly elegant fit
 (granted, backreferences et al break that model, but it's still a nice
 metaphor).

Yeah, without backreferences it's a state machine. Still it's NFA (non-deterministic) as no DFA would do if you want to get things like captures of sub matches etc. Either way the remarkable giant switch is present ;)
 The main problem I see is working with other compilers like GCC/LLVM. If
 this can be done on those compilers, I don't see any major hurdle to
 getting this implemented.

Perhaps the biggest one would be convincing GCC/LLVM devs to accept patches :) -- Dmitry Olshansky
Oct 12 2012
prev sibling next sibling parent "Tyler Jameson Little" <beatgammit gmail.com> writes:
 I'm a big fan of explicit, guaranteed TCE.

 However, the primary problem with this approach is a really 
 mundane one: The major compiler back ends (GCC and LLVM) don't 
 have any means of guaranteeing TCE...

Ugh... I thought that might be a problem. I don't know too much about GCC/LLVM, but I saw 'tailcallelim' for LLVM: http://llvm.org/docs/Passes.html#tailcallelim GCC seems to support it in 4.x: arxiv.org/pdf/1109.4048 These look promising, so I wouldn't completely rule out the possibility of doing it in GCC/LLVM. Perhaps someone more knowledgeable about GCC/LLVM could comment? I would really like to see D have this feature (then I can stop daydreaming about LISP).
Oct 12 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Tyler Jameson Little:

 D could use something like Newsqueak's become keyword. If 
 you're not familial with Newsqueak, become is just like a 
 return, except it replaces the stack frame with the function 
 that it calls.

Are you talking about CPS? http://en.wikipedia.org/wiki/Continuation_passing_style
 DMD should optimize this already, but explicitly stating become 
 is a key to the compiler that the user wants this call to be 
 eliminated.  Then, more interesting things can be implemented 
 more simply, like a state machine:

     void stateA() {
         become stateB();
     }

     void stateB() {
         become stateC();
     }

     void stateC() {
         return;
     }

     void main() {
         become stateA();
     }

Seems nice.
 I'm not sure how D handles stack sizes, so this may be an issue 
 as well if stack sizes are determined at runtime.

D doesn't currently support C99 VLAs, but it supports alloca(), so stack frames are sized dynamically. But maybe this is not a big problem for CPS. Bye, bearophile
Oct 12 2012
prev sibling next sibling parent "Tyler Jameson Little" <beatgammit gmail.com> writes:
 No idea what you are talking about.

I'm not sure which part wasn't clear, so I'll try to explain myself. Please don't feel offended if I clarify things you already understand. An optimizable tail call must simply be a function call. The current stack frame would be replaced with the new function, so anything more complex than a simple function call would require some stack from the preceding function to stick around in the new function, thus requiring the old stack to stick around. For example, te following is not optimizable the old stack (the one with 3) needs to be maintained until foo() returns, which is not TCE. return foo() * 3 Since the old stack won't be around anymore, that leaves us with in a sticky situation with regard to scope(): http://dlang.org/statement.html#ScopeGuardStatement If the current stack is going to be replaced with data from another function call, the behavior of scope() is undefined. The scope that scope() was in has now been repurpose, but the scope is still kind of there. If scope() is allowed, they must be executed just before the tail call, otherwise it will be overwritten (or it has to stick around until the actual stack frame is cleared. Consider: void a() { become b(); } void b() { // when does this get called? scope(exit) writeln("exited"); become a(); } If we allow scope(), then the line should be written before the call to a(). If we don't, then this is a compile time error. I like disallowing it personally, because if the scope(exit) call frees some memory that is passed to a, the programmer may think that it will be called after a exits, which may not be the case. void a(void* arr) { // do something with arr become b(); } void b() { void* arr = malloc(sizeof(float) * 16); scope(exit) free(arr); become a(arr); } I just see this as being a problem for those who don't fully understand scoping and TCE. My mention of overhead was just how complicated it would be to implement. The general algorithm is (for each become keyword): * determine max stack size (consider all branches in all recursive contexts) * allocate stack size for top-level function * do normal TCE stuff (use existing stack for new call) The stack size should be known at compile time for cases like the one above (a calls b, b calls a, infinitely) to avoid infinitely expanding stack. A situation like this is a memory optimization, so forcing guaranteed stack size puts an upper-bound on memory usage, which is the whole point of TCE. If the stack is allowed to grow, there is opportunity for stack overflow. My use case for this is a simple compiler, but I'm sure this could be applied to other use cases as well. I'd like to produce code for some BNF-style grammar where each LHS is a function. Thus, my state machine wouldn't be a huge, unnatural switch statement that reads in the current state, but a series of code branches that 'become' other states, like an actual state machine. For example: A := B | C | hello B := bye | see ya C := go away void A() { char next = getNext(); if (next == 'b' || next == 's') { become B(); } if (next == 'g') { become C(); } if (next == 'h') { // consume until hello is found, or throw exception // then put some token on the stack } } void B() { // consume until 'bye' or 'see ya' } void C() { // consume until 'go away' } This would minimize memory use and allow me to write code that more closely matches the grammar. There are plenty of other use cases, but DSLs would be very easy to implement with TCE.
Oct 12 2012
prev sibling next sibling parent "Tyler Jameson Little" <beatgammit gmail.com> writes:
On Friday, 12 October 2012 at 18:02:57 UTC, bearophile wrote:
 Tyler Jameson Little:

 D could use something like Newsqueak's become keyword. If 
 you're not familial with Newsqueak, become is just like a 
 return, except it replaces the stack frame with the function 
 that it calls.

Are you talking about CPS? http://en.wikipedia.org/wiki/Continuation_passing_style

I don't think it would necessitate CPS, but that is a nice side effect. I'm thinking more of a recursive function call that may or may not return. For example, a process that shovels data between two network connections. If the data never stops, the function will never return. If there's some kind of a problem, then it could return with that error, and be restarted when that problem is fixed. All of this could happen with a series of function calls that use the same stack. void handleIncommingData() { if (error) { // returns directly to manageLongRunningProcess return; } // do something useful become longRunningProcess(); } void longRunningProcess() { become handleIncommingData(); } void manageLongRunningProcess() { longRunningProcess(); // there was a problem, so fix it .... // try again manageLongRunningProcess(); } Exceptions are not needed, so these can be nothrow functions, and this implementation is simpler than some complex while loop, while having the same memory footprint. CPS would make things like imitating javascript's setTimeout/setInterval possible. I don't think this is a major benefit for D because the parallelism/concurrency support is already pretty awesome. The main benefit is for implementing things like lexical analyzers (or tokenizers, whatever), which don't really depend on previous states and can emit tokens. This allows for efficient representation of recursive problems, that call functions circularly (a -> b -> c -> a -> b ...), like a state machine. I think it just allows an extra level of expressiveness without a backwards incompatible change to the language. True, you can express this same idea with trampolining, but that isn't as fun: http://stackoverflow.com/a/489860/538551 http://en.wikipedia.org/wiki/Tail-recursive_function#Through_trampolining There are still some problems that I think a LISP language would make more sense for, and for those problems, it would be great to express them in D with my other code.
 DMD should optimize this already, but explicitly stating 
 become is a key to the compiler that the user wants this call 
 to be eliminated.  Then, more interesting things can be 
 implemented more simply, like a state machine:

    void stateA() {
        become stateB();
    }

    void stateB() {
        become stateC();
    }

    void stateC() {
        return;
    }

    void main() {
        become stateA();
    }

Seems nice.

I'm glad you think so =D
 I'm not sure how D handles stack sizes, so this may be an 
 issue as well if stack sizes are determined at runtime.

D doesn't currently support C99 VLAs, but it supports alloca(), so stack frames are sized dynamically. But maybe this is not a big problem for CPS.

Well, dynamic stack frames aren't strictly a bad thing for CPS, it just removes the memory use guarantee. There's already a huge memory gain from using TCE, I just don't know debugging would be done if a function keeps adding to and passing a dynamic array.
Oct 12 2012
prev sibling next sibling parent "Tyler Jameson Little" <beatgammit gmail.com> writes:
 My mention of overhead was just how complicated it would be to
 implement. The general algorithm is (for each become keyword):

 * determine max stack size (consider all branches in all 
 recursive
 contexts)
 * allocate stack size for top-level function
 * do normal TCE stuff (use existing stack for new call)

What's wrong with just allocating a new stack _in-place_ of the old? In other words make 'become' synonym for 'reuse the current stack frame'. Effectively you still stay in constant space that is maximum of all functions being called.

That would work too. If scope() is disallowed, it doesn't matter where the stack comes from. It's only slightly cheaper to reuse the current stack (CPU), but making a new one would be lighter on memory.
 I see nice staff. My use case is optimizing virtual machine, 
 the one inside std.regex primarily.

Yeah, that is a great example! I've read some bug reports about std.regex using a ton of memory, especially with CTFE. Since regex is by definition a state machine, this would be a particularly elegant fit (granted, backreferences et al break that model, but it's still a nice metaphor). The main problem I see is working with other compilers like GCC/LLVM. If this can be done on those compilers, I don't see any major hurdle to getting this implemented.
Oct 12 2012
prev sibling next sibling parent "Tyler Jameson Little" <beatgammit gmail.com> writes:
 Hey, that's dmd (compiler) using a ton of memory,  not 
 std.regex :(
 It actually flies with only a modest set of ram after CTFE (or 
 rather 'if') succeeds that is :)

My bad. Even then, TCE wouldn't hurt.
 The main problem I see is working with other compilers like 
 GCC/LLVM. If
 this can be done on those compilers, I don't see any major 
 hurdle to
 getting this implemented.

Perhaps the biggest one would be convincing GCC/LLVM devs to accept patches :)

I think getting Walter Bright on board is the best starting point. If he likes the idea, I'm sure we can work out way with the GCC/LLVM devs. I saw some basic signs (noted earlier) that this may be a non-issue, as the functionality may already be there. I'll keep looking and see if I can find a definitive answer for those compilers. Would support of one of the compilers be enough, or would both be required to get this in the formal language spec?
Oct 12 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Dmitry Olshansky:

 Perhaps the biggest one would be convincing GCC/LLVM devs to 
 accept patches :)

From what I've seen LLVM devs seem open enough to good patches. They have accepted several changes to allow LLVM to become the back-end of GHC (Haskell compiler), and generally my enhancement requests to improve LDC1 life were well accepted. They have even kept open a large request enhancement of mine to optimize D vector ops in the back-end. Bye, bearophile
Oct 12 2012
prev sibling next sibling parent "David Nadlinger" <see klickverbot.at> writes:
On Friday, 12 October 2012 at 17:39:53 UTC, Alex Rønne Petersen 
wrote:
 However, the primary problem with this approach is a really 
 mundane one: The major compiler back ends (GCC and LLVM) don't 
 have any means of guaranteeing TCE...

LLVM shouldn't be as big a problem – there is some support for guaranteed TCO in order to make implementations of some of the functional languages possible. I know that you can force LLVM to tail-call everything it possibly can (which in consequence horribly breaks the ABI), but I am not sure right now how fine-grained you can control that mechanism. Also don't forget that some calling conventions don't lend themselves particularly well for doing efficient tail calls. David
Oct 12 2012
prev sibling parent "Tyler Jameson Little" <beatgammit gmail.com> writes:
On Friday, 12 October 2012 at 20:23:00 UTC, David Nadlinger wrote:
 On Friday, 12 October 2012 at 17:39:53 UTC, Alex Rønne 
 Petersen wrote:
 However, the primary problem with this approach is a really 
 mundane one: The major compiler back ends (GCC and LLVM) don't 
 have any means of guaranteeing TCE...

LLVM shouldn't be as big a problem – there is some support for guaranteed TCO in order to make implementations of some of the functional languages possible. I know that you can force LLVM to tail-call everything it possibly can (which in consequence horribly breaks the ABI), but I am not sure right now how fine-grained you can control that mechanism. Also don't forget that some calling conventions don't lend themselves particularly well for doing efficient tail calls. David

I found this: http://llvm.org/docs/CodeGenerator.html#tail-call-optimization http://llvm.org/docs/CodeGenerator.html#target-feature-matrix It seems that llvm won't be a problem. I've never worked with LLVM (or any compiler for that matter) at this low of a level, but I assume that the front-end produces code that looks like the provided code snippet in the first link. If that's the case, then we can basically guarantee that LLVM will do what we expect, as long as we can guarantee that all callers and callees use "fastcc". I'm not 100% on the implications of this, but it should work. As for GCC, the situation seems less hopeful. I found this thread about GUILE, but it did mention GCC's lack of support for tail calls. This was april of last year, so maybe things have improved. The thread does mention that the GCC devs would be open to suggestions, but it seems like this might be a harder fought battle than for LLVM. http://lists.gnu.org/archive/html/guile-devel/2011-04/msg00055.html LLVM should be sufficient though, right? GDC can just outright reject explicit TCO for now until it supports proper TCO. Maybe the GUILE mailing list would be a good place to start, since there may be efforts already there. What steps would need to happen for this to become a reality? Here's my list: 1. Get Walter Bright/Andrei Alexandrescu on board 2. Verify that it will work with LLVM 3. Get it working in DMD 4. Get it working in LDC 5. Work with GCC devs Is there enough interest in this to implement it? I really don't know DMD or LLVM at all, so I don't know how big of a project this is.
Oct 12 2012