www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A problem with generators

reply bearophile <bearophileHUGS lycos.com> writes:
D currently has two ways to define a generator, the first is the older one with
opApply (And opApplyReverse):
int opApply(int delegate(ref Type [, ...]) dg);


The syntax of opApply is terrible: hard to remember, bug-prone, noisy,
intrusive. And this syntax doesn't even gain performance, because dmd is not
very good at inlining here.
Several times in the past I have said that a very good syntax for this is the
Python one:

def foo():
  yield 5

It's really easy to use and remember, not bug prone. This possible syntax is
about 75 times better than the current opApply:

yield(int) foo() {
  yield 5;
}


That's sugar for something like:

struct foo {
    int opApply(int delegate(ref int) dg) {
        int result;
        int r = 5;
        result = dg(r);
        if (result)
            goto END;
      END:
        return result;
    }
}


Python syntax was good enough that C# has copied it.


The other way to create a struct/class generator in D2 is with the Range
protocol, with the methods that ask for the next item, etc. I have seen that
such generators sometimes are harder to write, because you have to manage the
state yourself, but they can be more efficient and they are more powerful.

Both ways are useful, because they are useful for different situations. They
are one the inverted-out version of the other.


The implementations of generators in C# and Python (and D, but not Lua) suffer
of a problem, that increases the computational complexity of the code when
there is a nested generator.

And both Python and C# have found similar solutions that improve both the
syntax and the performance (both languages have not yet implemented it).

Python version:
http://python.org/dev/peps/pep-0380/

With that this code:

def foo():
  for x in some_iterable:
    yield x


Becomes:

def foo():
  yield from some_iterable


In C# it can be called "yield foreach":

http://kirillosenkov.blogspot.com/2007/10/yield-foreach.html

http://connect.microsoft.com/VisualStudio/feedback/details/256934/yield-return-to-also-yield-collections

http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx


As D2 will start coming out of its embryo state, it too will probably have to
fece this problem with iterators. In the meantime I hope the opApply syntax
will be replaced by something better (the yield I've shown).

Bye,
bearophile
Mar 17 2010
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 D currently has two ways to define a generator, the first is the older one with

 int opApply(int delegate(ref Type [, ...]) dg);
 The syntax of opApply is terrible: hard to remember, bug-prone, noisy,

good at inlining here.
 Several times in the past I have said that a very good syntax for this is the

 def foo():
   yield 5
 It's really easy to use and remember, not bug prone. This possible syntax is

 yield(int) foo() {
   yield 5;
 }

The **BIG** advantage of the status quo, where the loop body is a delegate instead of being "magic", is that it enables lots of hackish things to be done, like running different iterations of the loop in different threads.
Mar 17 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:
 The **BIG** advantage of the status quo, where the loop body is a delegate
instead
 of being "magic", is that it enables lots of hackish things to be done, like
 running different iterations of the loop in different threads.

That's a defect, not a quality ;-) Something like this: yield(int) foo() { yield 5; } Has advantages: - It's much simpler to remember, while I need to look in code snippets every time I want to write an opApply - It's shorter to write and to read, so you need less time to write code, you put less bugs in the code, and your program gets smaller. It saves several lines of code. It's easy to see if it's correct. - It's simpler to learn, python and C# programmers need almost zero time to learn it, and other programmers surely will find it simpler than the opAssign. And If you want to do hacks there's the other syntax, based on range iteration methods. As you say, the good thing of opApply is that it contains no magic, the syntax can almost be mapped (if you know D very well) to the assembly code the compiler will generate. But there's not enough sugar there. Anyway, D2 is finalized, so I don't think yield will be added soon. It's OK. But I think opApply is a hack that I don't like to see in D2. Go language shows that it's better to not have a feature than having a bad one, even if it's a very important feature like exceptions, classes, etc, because new features can be added later, while it's harder to remove them. Bye, bearophile
Mar 17 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Adam D. Ruppe:
 It says exactly what it does, and enables something you couldn't do before.

Even assembly says what it does, but here I want more sugar :-) It helps.
 How is that a hack?

It shows too much the machinery, this can lead to bugs, requires more code, requires more memory to remember it, you need more time to read it and see if/where it's wrong.
 The Go language sucks.

In future it can suck less :-) And even a bad product or ignorant person can give useful suggestions. The authors of Go are not ignorant people: Robert Griesemer, Rob Pike and Ken Thompson. I am not even worth brushing their sandals :-) They are giving a lesson in almost the limits of minimalism in a modern language. Bye, bearophile
Mar 17 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/17/2010 05:03 PM, bearophile wrote:
 Adam D. Ruppe:
 It says exactly what it does, and enables something you couldn't do
 before.

Even assembly says what it does, but here I want more sugar :-) It helps.
 How is that a hack?

It shows too much the machinery, this can lead to bugs, requires more code, requires more memory to remember it, you need more time to read it and see if/where it's wrong.
 The Go language sucks.

In future it can suck less :-) And even a bad product or ignorant person can give useful suggestions. The authors of Go are not ignorant people: Robert Griesemer, Rob Pike and Ken Thompson. I am not even worth brushing their sandals :-)

Me neither, but we should judge Go on its own merit.
 They are giving a lesson in
 almost the limits of minimalism in a modern language.

I honestly don't believe that's the case. If you're looking for such lessons, there are plenty of much better examples of minimalistic languages. Go is simply an unfinished language; it doesn't have the kind of features that make a minimalistic language be also comprehensive. There are problems for which Go does not have an answer yet and that's simply because an answer hasn't been thought of yet. From what I see so far, it's unlikely revolutionary solutions are right down the corner. Go is a rehash of old ideas - but fortunately fixes the typo in O_CREAT. Andrei
Mar 17 2010
prev sibling parent Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Go language
 shows that it's better to not have a feature than having a bad one, even if
 it's a very important feature like exceptions, classes, etc, because new
 features can be added later, while it's harder to remove them.

I don't agree that go shows that at all, we will only know what it shows after it's been in use for a few years.
Mar 17 2010
prev sibling next sibling parent bearophile <bearophileHUGS lycos.com> writes:
And by the way, you seem to have missed the purpose of my post: it was about
the problem of computational complexity of nested generators.

Bye,
bearophile
Mar 17 2010
prev sibling parent "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wed, Mar 17, 2010 at 05:47:12PM -0400, bearophile wrote:
 But I think opApply is a hack that I don't like to see in D2.

It says exactly what it does, and enables something you couldn't do before. How is that a hack?
 Go language shows

The Go language sucks. -- Adam D. Ruppe http://arsdnet.net
Mar 17 2010
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Wed, Mar 17, 2010 at 05:26:14PM -0400, bearophile wrote:
 It's really easy to use and remember, not bug prone. This possible syntax is
about 75 times better than the current opApply:
 
 yield(int) foo() {
   yield 5;
 }

That's a *really* subjective statement. I understand pretty close to instantly what opApply does - sends your data to the function inside the loop. Even with your explanation, I still don't really get what yield does. -- Adam D. Ruppe http://arsdnet.net
Mar 17 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Adam D. Ruppe:
 Even with your explanation, I still don't really get what yield does.

Try to read after the line: "That's sugar for something like:" You can see a generator that uses opApply that will be clear. It's a way to spit out things out of a struct/class, it's a way to define a generator, that is something that spits out a new item every time you ask it to, with a foreach. Bye, bearophile
Mar 17 2010
prev sibling next sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
bearophile wrote:

 D currently has two ways to define a generator, the first is the older one
 with opApply (And opApplyReverse): int opApply(int delegate(ref Type [,
 ...]) dg);
 
 
 The syntax of opApply is terrible: hard to remember, bug-prone, noisy,
 intrusive. And this syntax doesn't even gain performance, because dmd is
 not very good at inlining here. Several times in the past I have said that
 a very good syntax for this is the Python one:
 
 def foo():
   yield 5
 
 It's really easy to use and remember, not bug prone. This possible syntax
 is about 75 times better than the current opApply:
 
 yield(int) foo() {
   yield 5;
 }
 
 
 That's sugar for something like:
 
 struct foo {
     int opApply(int delegate(ref int) dg) {
         int result;
         int r = 5;
         result = dg(r);
         if (result)
             goto END;
       END:
         return result;
     }
 }
 

Hi, is this generator thing really the same as opApply? I thought this is more like co-routines. Can yield be as efficient as current opApply? (assuming inlining will eventually happen where possible) The tango docs have/had a good discussion about opApply vs Iterator, where the author called opApply a visitor iirc. Unfortunately the part about generators was never finished. yield may be easy to understand once you understand it :) It's prettier for sure. But take a glimpse at this page on stackoverflow, the length of the answers should tell you that it's not as straightforward as you might think and that it's not only about the syntax: http://stackoverflow.com/questions/231767/can-somebody-explain-me-this- python-yield-statement
Mar 17 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Lutger:
 Hi, is this generator thing really the same as opApply? I thought this is  
 more like co-routines.

In Python it's syntax for co-routines, but the syntax I have suggested for D is just sugar for the opApply.
Can yield be as efficient as current opApply? (assuming inlining will
eventually happen where possible)<

Currently opApply is not efficient with dmd. But LDC is able to often inline there, so I am happy. The yield as I mean it is as (in)efficient as the opApply. Note that where you need more performance you can reverse inside-out the iterator, and use the range iterator methods. If they are not virtual dmd has no problem in inlining them. And note that in 80-90% of your code you don't need 100% performance. There are many cases in your code where you need something that's handy, short, easy to remember and use, not error-prone, but you don't need to save few bits of time.
 yield may be easy to understand once you understand it :)

It contains more syntax sugar, so a system programmer can find it harder to understand what actually it's doing. This is a disadvantage. Too much syntax sugar gives syntax cancer :-) It's prettier for
 sure. But take a glimpse at this page on stackoverflow, the length of the 
 answers should tell you that it's not as straightforward as you might think 
 and that it's not only about the syntax:

I don't fully like the yield syntax of Python because they have first invented it for a simple but very useful purpose, and then they have badly extended it for coroutines too. So now you have a statement that's used almost as a variable name. Ugh :-P Regarding the semantics of python yield, it's not as simple as it was in the beginning, because they have bolted on it the coroutine semantics. And it contains a trap. That's probably why that thread on stack overflow is so long. This problem is not shared by my simpler yield I have suggested probably more than one year ago for D1. Do you think future D newbies will will less hard to learn the opApply compared to the yield? Anyway, the purpose of this post was not to discuss about yield syntax. The main problem with generators was not a problem about syntax, but to discuss the problem of computational complexity of nested generators. The links about C# show it very well. Bye, bearophile
Mar 17 2010
prev sibling next sibling parent reply =?ISO-8859-1?Q?Pelle_M=E5nsson?= <pelle.mansson gmail.com> writes:
On 03/17/2010 10:26 PM, bearophile wrote:
 D currently has two ways to define a generator, the first is the older one
with opApply (And opApplyReverse):
 int opApply(int delegate(ref Type [, ...]) dg);


 The syntax of opApply is terrible: hard to remember, bug-prone, noisy,
intrusive. And this syntax doesn't even gain performance, because dmd is not
very good at inlining here.
 Several times in the past I have said that a very good syntax for this is the
Python one:

 def foo():
    yield 5

 It's really easy to use and remember, not bug prone. This possible syntax is
about 75 times better than the current opApply:

 yield(int) foo() {
    yield 5;
 }


 That's sugar for something like:

 struct foo {
      int opApply(int delegate(ref int) dg) {
          int result;
          int r = 5;
          result = dg(r);
          if (result)
              goto END;
        END:
          return result;
      }
 }


 Python syntax was good enough that C# has copied it.


 The other way to create a struct/class generator in D2 is with the Range
protocol, with the methods that ask for the next item, etc. I have seen that
such generators sometimes are harder to write, because you have to manage the
state yourself, but they can be more efficient and they are more powerful.

 Both ways are useful, because they are useful for different situations. They
are one the inverted-out version of the other.


 The implementations of generators in C# and Python (and D, but not Lua) suffer
of a problem, that increases the computational complexity of the code when
there is a nested generator.

 And both Python and C# have found similar solutions that improve both the
syntax and the performance (both languages have not yet implemented it).

 Python version:
 http://python.org/dev/peps/pep-0380/

 With that this code:

 def foo():
    for x in some_iterable:
      yield x


 Becomes:

 def foo():
    yield from some_iterable


 In C# it can be called "yield foreach":

 http://kirillosenkov.blogspot.com/2007/10/yield-foreach.html

 http://connect.microsoft.com/VisualStudio/feedback/details/256934/yield-return-to-also-yield-collections

 http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx


 As D2 will start coming out of its embryo state, it too will probably have to
fece this problem with iterators. In the meantime I hope the opApply syntax
will be replaced by something better (the yield I've shown).

 Bye,
 bearophile

While I think it should be done properly with ranges and not opApply, I strongly support this syntactic sugar.
Mar 17 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Pelle M.: 
 While I think it should be done properly with ranges and not opApply,

What do you mean? (There is already the syntax based on the range protocol, where you manage manually the state of your iterator. I have not suggested to remove that.) Bye, bearophile
Mar 17 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Pelle Månsson:
I strongly support this syntactic sugar.<

Note that syntax was invented on the spot, it's not perfect, as you have seen it's sugar for a struct, so what if you want a class? :-) Bye, bearophile
Mar 17 2010
prev sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
I really like that simplified, "yield"-based syntax. However, I believe 
that it conflicts fundamentally with D's confusing concept of generators.

Most other languages that offer generators (e.g. Python) view the 
generator as an object that spits out values one-by-one. This is very 
similar to a co-routine.

Purely for performance reasons, D approaches the issue in a backward 
way, passing the loop body as a delegate. Indeed, this may be faster 
than full-fledged co-routines, but it is rather difficult to map a 
simplified coroutine-like syntax based on 'yield' statements to this 
very different concept by mere "syntactic sugar". Furthermore, the D 
concept scales awkwardly to combining multiple generators in one loop or 
using nested generators.

In fact, there is a simple but powerful way to design purely stack-based 
generators, sticking to the straightforward concept of co-routines 
without the overhead of full co-routines and without the need for 
delegates (inspired by the Sather language):

Make a generator a co-routine but restrict its lifetime to the inside of 
a loop. When the loop starts, it calls all the contained generators to 
initialize their local data on the regular stack. When the generators 
yield, the stack-pointer is *not* reset (as it would be at a return) but 
kept at its value. The generator routines can therefore be called again 
within the same loop and find their local data still on the stack. When 
the loop finishes, the stack pointer is reset to its initial value and 
all the local data of all the contained generators is immediately released.

This design does not interfere with the regular use of the stack for 
regular function calls within the loop. Nested loops are possible, as 
are nested generators or loops that combine values from multiple 
generators. All with a straightforward syntax and all based on pure 
stack operations that are no more costly than regular function calls and 
can be inlined just as easily.

This concept even allows to place calls to generators at arbitrary 
positions inside a loop, rather than at the beginning only. The regular 
D loop:
     foreach(auto x;mygenerator) {
         do_something(x);
     }
could be written as something like
     loop {
         do_something(#mygenerator);
     }
where I have blindly introduced a "#" operator to signify a generator 
call that may either yield a value or break the loop. (I don't care 
about the exact syntax, but it should stick out somehow.

This can be generalised as
     loop {
         do_something();
         do_something_else(#mygenerator);
         yet_something_else(#anothergenerator);
     };
where each expression containing a # generator call may either yield a 
value or break the loop.

The clear restriction of this is that "native" generators of this design 
are not stand-alone objects, cannot survive outside a loop and work only 
within a single thread. More general generators, however, can easily be 
implemented as objects that offer a "native" generator as interface.

The only difficulty of this approach is that it has to be done at the 
assembler level because a unusual calling convention is needed. Typical 
high-level languages do not allow an emulation of this mechanism. I do 
not know about byte-code representations.

This whole mechanism is inspired by the Sather languages that first 
would have allowed it (even though that compiler never actually 
implemented it this way). To my knowledge, all other languages that 
offer generators did break this concept by allowing generators to exist 
beyond the lifetime of a loop and therefore depend on heap-allocated data.

I don't expect that D would ever adopt such a breaking change, even 
though it would be the most efficient implementation of generators while 
retaining the conceptually simple definition of co-routines. Still, I 
find the concept just so beautiful that I have to talk about it once in 
a while...

Greetings,
Norbert



bearophile wrote:
 D currently has two ways to define a generator, the first is the older one
with opApply (And opApplyReverse):
 int opApply(int delegate(ref Type [, ...]) dg);
 
 
 The syntax of opApply is terrible: hard to remember, bug-prone, noisy,
intrusive. And this syntax doesn't even gain performance, because dmd is not
very good at inlining here.
 Several times in the past I have said that a very good syntax for this is the
Python one:
 
 def foo():
   yield 5
 
 It's really easy to use and remember, not bug prone. This possible syntax is
about 75 times better than the current opApply:
 
 yield(int) foo() {
   yield 5;
 }
 
 
 That's sugar for something like:
 
 struct foo {
     int opApply(int delegate(ref int) dg) {
         int result;
         int r = 5;
         result = dg(r);
         if (result)
             goto END;
       END:
         return result;
     }
 }
 
 
 Python syntax was good enough that C# has copied it.
 
 
 The other way to create a struct/class generator in D2 is with the Range
protocol, with the methods that ask for the next item, etc. I have seen that
such generators sometimes are harder to write, because you have to manage the
state yourself, but they can be more efficient and they are more powerful.
 
 Both ways are useful, because they are useful for different situations. They
are one the inverted-out version of the other.
 
 
 The implementations of generators in C# and Python (and D, but not Lua) suffer
of a problem, that increases the computational complexity of the code when
there is a nested generator.
 
 And both Python and C# have found similar solutions that improve both the
syntax and the performance (both languages have not yet implemented it).
 
 Python version:
 http://python.org/dev/peps/pep-0380/
 
 With that this code:
 
 def foo():
   for x in some_iterable:
     yield x
 
 
 Becomes:
 
 def foo():
   yield from some_iterable
 
 
 In C# it can be called "yield foreach":
 
 http://kirillosenkov.blogspot.com/2007/10/yield-foreach.html
 
 http://connect.microsoft.com/VisualStudio/feedback/details/256934/yield-return-to-also-yield-collections
 
 http://blogs.msdn.com/wesdyer/archive/2007/03/23/all-about-iterators.aspx
 
 
 As D2 will start coming out of its embryo state, it too will probably have to
fece this problem with iterators. In the meantime I hope the opApply syntax
will be replaced by something better (the yield I've shown).
 
 Bye,
 bearophile

Mar 18 2010
parent reply Regan Heath <regan netmail.co.nz> writes:
Norbert Nemec wrote:
 I really like that simplified, "yield"-based syntax. However, I believe 
 that it conflicts fundamentally with D's confusing concept of generators.
 
 Most other languages that offer generators (e.g. Python) view the 
 generator as an object that spits out values one-by-one. This is very 
 similar to a co-routine.
 
 Purely for performance reasons, D approaches the issue in a backward 
 way, passing the loop body as a delegate. Indeed, this may be faster 
 than full-fledged co-routines, but it is rather difficult to map a 
 simplified coroutine-like syntax based on 'yield' statements to this 
 very different concept by mere "syntactic sugar". Furthermore, the D 
 concept scales awkwardly to combining multiple generators in one loop or 
 using nested generators.
 
 In fact, there is a simple but powerful way to design purely stack-based 
 generators, sticking to the straightforward concept of co-routines 
 without the overhead of full co-routines and without the need for 
 delegates (inspired by the Sather language):
 
 Make a generator a co-routine but restrict its lifetime to the inside of 
 a loop. When the loop starts, it calls all the contained generators to 
 initialize their local data on the regular stack. When the generators 
 yield, the stack-pointer is *not* reset (as it would be at a return) but 
 kept at its value. The generator routines can therefore be called again 
 within the same loop and find their local data still on the stack. When 
 the loop finishes, the stack pointer is reset to its initial value and 
 all the local data of all the contained generators is immediately released.

So, these generators are essentially inline functions? R
Mar 18 2010
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Regan Heath wrote:
 So, these generators are essentially inline functions?

No. They may be inlined, but this is just an optimization just like for regular functions. If they are not inlined, these generators are simply functions with a special calling convention that allows them to retain their local data and execution point on the stack in between calls. I did work out the assembly code for this some time ago and it worked nicely. Very similar to what is needed for co-routines, but it is possible to do everything on the stack.
Mar 18 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Norbert Nemec wrote:
 Regan Heath wrote:
 So, these generators are essentially inline functions?

No. They may be inlined, but this is just an optimization just like for regular functions. If they are not inlined, these generators are simply functions with a special calling convention that allows them to retain their local data and execution point on the stack in between calls. I did work out the assembly code for this some time ago and it worked nicely. Very similar to what is needed for co-routines, but it is possible to do everything on the stack.

How is that different from a local delegate that refers to local variables of the function it is lexically nested in?
Mar 18 2010
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 03/18/2010 01:49 PM, Walter Bright wrote:
 Norbert Nemec wrote:
 Regan Heath wrote:
 So, these generators are essentially inline functions?

No. They may be inlined, but this is just an optimization just like for regular functions. If they are not inlined, these generators are simply functions with a special calling convention that allows them to retain their local data and execution point on the stack in between calls. I did work out the assembly code for this some time ago and it worked nicely. Very similar to what is needed for co-routines, but it is possible to do everything on the stack.

How is that different from a local delegate that refers to local variables of the function it is lexically nested in?

I think the difference is that e.g. a coroutine can repeatedly return values from a loop. In contrast, the delegate must keep the state separate, which may be a tad more complicated. Andrei
Mar 18 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Andrei Alexandrescu wrote:
 On 03/18/2010 01:49 PM, Walter Bright wrote:
 Norbert Nemec wrote:
 Regan Heath wrote:
 So, these generators are essentially inline functions?

No. They may be inlined, but this is just an optimization just like for regular functions. If they are not inlined, these generators are simply functions with a special calling convention that allows them to retain their local data and execution point on the stack in between calls. I did work out the assembly code for this some time ago and it worked nicely. Very similar to what is needed for co-routines, but it is possible to do everything on the stack.

How is that different from a local delegate that refers to local variables of the function it is lexically nested in?

I think the difference is that e.g. a coroutine can repeatedly return values from a loop. In contrast, the delegate must keep the state separate, which may be a tad more complicated.

Right, the delegate would have to keep its persistent state in outer locals. The trouble with a generator using the caller's stack is that then the generator cannot recursively call itself (such as if it was walking a tree). In order to work properly in the general case, a generator has to allocate all its local variables on the heap.
Mar 18 2010
parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Walter Bright wrote:
 The trouble with a generator using the caller's stack is that then the 
 generator cannot recursively call itself (such as if it was walking a 
 tree). In order to work properly in the general case, a generator has to 
 allocate all its local variables on the heap.

Which concept of generators do you refer to? The stack-based generators that I suggested do in fact work recursively. A generator can obtain its values from other generators or even recursively from itself without needing heap space anywhere.
Mar 19 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Norbert Nemec wrote:
 Walter Bright wrote:
 The trouble with a generator using the caller's stack is that then the 
 generator cannot recursively call itself (such as if it was walking a 
 tree). In order to work properly in the general case, a generator has 
 to allocate all its local variables on the heap.

Which concept of generators do you refer to? The stack-based generators that I suggested do in fact work recursively. A generator can obtain its values from other generators or even recursively from itself without needing heap space anywhere.

How can it do that, and yet allow the calling function also call other functions?
Mar 19 2010
parent Norbert Nemec <Norbert Nemec-online.de> writes:
Walter Bright wrote:
 Norbert Nemec wrote:
 Walter Bright wrote:
 The trouble with a generator using the caller's stack is that then 
 the generator cannot recursively call itself (such as if it was 
 walking a tree). In order to work properly in the general case, a 
 generator has to allocate all its local variables on the heap.

Which concept of generators do you refer to? The stack-based generators that I suggested do in fact work recursively. A generator can obtain its values from other generators or even recursively from itself without needing heap space anywhere.

How can it do that, and yet allow the calling function also call other functions?

When a generator is called for the first time, it shifts the ESP to make room for its own local variables. When it yields, the ESP remains where it is, so the generator's local data is protected. At any time, the ESP can be used as usual for regular function calls. Whenever a loop is entered, it needs some additional space on the stack for all the local data of contained generators. This stack space is finite, simply defined by the sum of the local data of each contained generator. The loop's stack space is released when the loop is terminated. Any loops nested inside the outer loop, perhaps inside generators, will reserve additional stack space, but none of these stack spaces can survive the lifetime of an outer loop, so whenever a loop exits, all the stack spaces if inner loops can be freed as well. I will try to work out an example in assembly code, but I have not used assembly in years, so it will take some time.
Mar 19 2010
prev sibling parent Norbert Nemec <Norbert Nemec-online.de> writes:
Walter Bright wrote:
 Norbert Nemec wrote:
 Regan Heath wrote:
 So, these generators are essentially inline functions?

No. They may be inlined, but this is just an optimization just like for regular functions. If they are not inlined, these generators are simply functions with a special calling convention that allows them to retain their local data and execution point on the stack in between calls. I did work out the assembly code for this some time ago and it worked nicely. Very similar to what is needed for co-routines, but it is possible to do everything on the stack.

How is that different from a local delegate that refers to local variables of the function it is lexically nested in?

It is pretty much the same in the trivial case of one loop having one generator in the loop header. The difference shows up in more complex cases: * Try to design a "filter" (a generator that takes another generator as argument) - may be possible but probably ugly to implement. * Try to run a loop that pulls its values pairwise from two independent generators - I don't see a way to do this without serious mental acrobatic. Apart from that, I guess that the runtime overhead without inlining is somewhat different. Would have to investigate that in more detail.
Mar 19 2010