www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - pure generators

reply bearophile <bearophileHUGS lycos.com> writes:
This is a post of unfocused musings, so you can ignore this post if you have
better things to do.

'generators' are something like structs that contain an opApply or the methods
of the Range protocol:


import std.stdio: write;

struct XPrimes {
    int stop;

    int opApply(int delegate(ref int) dg) {
        int result;
        foreach (x; 2 .. stop) {
            bool all = true;
            foreach (i; 2 .. x)
                if (!(x % i)) {
                    all = false;
                    break;
                }
            if (all) {
                result = dg(x);
                if (result)
                    break;
            }
        }
        return result;
    }
}

void main() {
    foreach (p; XPrimes(30))
        write(p, " ");
}



This is how you can do the same thing in Python, the code is a bit shorter:


 p = (x for x in xrange(2,30) if all(x % i for i in xrange(2,x)))
 list(p)



 list(p)



That code also shows that Python generators can be used only once. So using Range speech, they are all "Input Ranges", I presume to avoid indemonstrable assumptions about their nature that can lead to bugs. In D the situation is different, this prints the primes two times: import std.stdio: write, writeln; class XPrimes { int stop; this(int s) { stop = s; } int opApply(int delegate(ref int) dg) { int result; foreach (x; 2 .. stop) { bool all = true; foreach (i; 2 .. x) if (!(x % i)) { all = false; break; } if (all) { result = dg(x); if (result) break; } } return result; } } void main() { auto primes = new XPrimes(30); foreach (p; primes) write(p, " "); writeln(); foreach (p; primes) write(p, " "); } In few more years of D learning I hope to understand why D iteration is designed that way, and if there are problems in it. In practice a generator like: (x for x in xrange(2,N) if all(x % i for i in xrange(2,x))) is kind of pure, because despite being lazy, the output it generates is only influences by an input parameter N. So it's a Forward Range in D speech. In theory the D Ranges can grow have a standard constant that the programmer can add to tell if it's an Input Range or Forward Range. I don't know if this is a good idea, the compiler can't verify that this constant value is right. So in practice the idea of pure generators can be seen as a way (that the compiler is able to verify) that a generator is not just an Input Range, but a Forward Range. It can be possible to test it using similar ways used to test that a function marked with pure is actually pure. What can pure generators be useful for in D? In Haskell many "generators" are pure, this allows to do things like: fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci) fibs = take 20 fibonacci main = print fibs That prints: [1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765] The meaning of that little Haskell program is simple: fibonacci is a symbol, it can be seen as a lazy generator that yields a list of numbers. The first and second are 1 (the colon stands for list item concatenation), to it concatenates the zipping between the result of the generator and another "copy" of the same generator minus the first item. You can write the same algorithm in Python using lazy iterators: from itertools import izip, islice def fibonacci(): yield 1 yield 1 for nm1, n in izip(fibonacci(), islice(fibonacci(), 1, None)): yield nm1 + n fibs = islice(fibonacci(), 0, 20) print list(fibs) (I have tried to write this same program in D1 but it seems rather impossible. I think it's possible in D2 but its debugging gets too much complex for my tastes. In D I'd like a nicer yield as in Python and Chapel). Even if doesn't show, the Haskell compiler is able to optimize quite well that little program, it can generate many Fibonacci numbers in a short time, while that Python code is really slow, uses tons of RAM, and it soon blows up the stack. (Lazyness in Python is easy to use, but it was bolted on Python at the last minute and it's not always good enough). A pure generator means that it always yields the same sequence of outputs given the same input. This allows for optimizations done by Haskell compiler in that example. In theory a D compiler can do the same optimizations. pure yield(int) primes(int stop) { foreach (x; 2 .. stop) { // an all() HOF can help here bool all = true; foreach (i; 2 .. x) if (!(x % i)) { all = false; break; } if (all) yield x; } } I am quite ignorant still about such topics of lazy pure functional programming. Bye, bearophile
May 12 2010
next sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
bearophile <bearophileHUGS lycos.com> wrote:

 In few more years of D learning I hope to understand why D iteration is  
 designed that way, and if there are problems in it.

This has to do with D iteration being more designed towards mimicking C arrays. In D2, you'd use a range, and possibly have popFront return the new (changed) range.
 In practice a generator like:

 (x for x in xrange(2,N) if all(x % i for i in xrange(2,x)))

 is kind of pure, because despite being lazy, the output it generates is  
 only influences by an input parameter N. So it's a Forward Range in D  
 speech.

Except that it also changes the input. It seems to me the python example uses a closure, which, as Norman Adams allegedly said, "are a poor man's objects." This closure must then change its associated state upon iteration. (Granted, coroutines may be closer to the truth here, but I have not quite grokked those yet, and the end result is basically the same.)
 pure yield(int) primes(int stop) {
     foreach (x; 2 .. stop) {
         // an all() HOF can help here
         bool all = true;
         foreach (i; 2 .. x)
             if (!(x % i)) {
                 all = false;
                 break;
             }
         if (all)
             yield x;
     }
 }

Here's an implementation that seems to work: module foo; import core.thread; import std.stdio; class YieldingFiber( T ) : Fiber { this( void function( ) fn ) { super( fn ); } this( void delegate( ) fn ) { super( fn ); } T call( ) { if ( auto o = super.call( ) ) { throw o; } return _value; } static void yield( T value ) { if ( auto p = cast( typeof( this ) )getThis( ) ) { p._value = value; } Fiber.yield( ); } private: T _value; } void primes( ) { int x = 2; while ( true ) { bool all = true; foreach ( i; 2 .. x ) { if ( !( x % i ) ) { all = false; break; } } if ( all ) { YieldingFiber!( int ).yield( x ); } ++x; } } void main( ) { auto primes = new YieldingFiber!( int )( &primes ); foreach ( i; 0..20 ) { writeln( primes.call( ) ); } }
 I am quite ignorant still about such topics of lazy pure functional  
 programming.

Welcome to the club! -- Simen
May 12 2010
prev sibling next sibling parent Jason House <jason.james.house gmail.com> writes:
pure is a keyword in D2 that means something different than how you're using
it. I also find it unusual that you use D2 range terminology but then give
examples that don't use D ranges!

I'm also fairly ignorant about what is in std.algorithm, but you should be able
to find things that look more like your functional code examples.

bearophile Wrote:

 This is a post of unfocused musings, so you can ignore this post if you have
better things to do.
 
 'generators' are something like structs that contain an opApply or the methods
of the Range protocol:
 
 
 import std.stdio: write;
 
 struct XPrimes {
     int stop;
 
     int opApply(int delegate(ref int) dg) {
         int result;
         foreach (x; 2 .. stop) {
             bool all = true;
             foreach (i; 2 .. x)
                 if (!(x % i)) {
                     all = false;
                     break;
                 }
             if (all) {
                 result = dg(x);
                 if (result)
                     break;
             }
         }
         return result;
     }
 }
 
 void main() {
     foreach (p; XPrimes(30))
         write(p, " ");
 }
 
 
 
 This is how you can do the same thing in Python, the code is a bit shorter:
 
 
 p = (x for x in xrange(2,30) if all(x % i for i in xrange(2,x)))
 list(p)



 list(p)



That code also shows that Python generators can be used only once. So using Range speech, they are all "Input Ranges", I presume to avoid indemonstrable assumptions about their nature that can lead to bugs. In D the situation is different, this prints the primes two times: import std.stdio: write, writeln; class XPrimes { int stop; this(int s) { stop = s; } int opApply(int delegate(ref int) dg) { int result; foreach (x; 2 .. stop) { bool all = true; foreach (i; 2 .. x) if (!(x % i)) { all = false; break; } if (all) { result = dg(x); if (result) break; } } return result; } } void main() { auto primes = new XPrimes(30); foreach (p; primes) write(p, " "); writeln(); foreach (p; primes) write(p, " "); } In few more years of D learning I hope to understand why D iteration is designed that way, and if there are problems in it. In practice a generator like: (x for x in xrange(2,N) if all(x % i for i in xrange(2,x))) is kind of pure, because despite being lazy, the output it generates is only influences by an input parameter N. So it's a Forward Range in D speech. In theory the D Ranges can grow have a standard constant that the programmer can add to tell if it's an Input Range or Forward Range. I don't know if this is a good idea, the compiler can't verify that this constant value is right. So in practice the idea of pure generators can be seen as a way (that the compiler is able to verify) that a generator is not just an Input Range, but a Forward Range. It can be possible to test it using similar ways used to test that a function marked with pure is actually pure. What can pure generators be useful for in D? In Haskell many "generators" are pure, this allows to do things like: fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci) fibs = take 20 fibonacci main = print fibs That prints: [1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765] The meaning of that little Haskell program is simple: fibonacci is a symbol, it can be seen as a lazy generator that yields a list of numbers. The first and second are 1 (the colon stands for list item concatenation), to it concatenates the zipping between the result of the generator and another "copy" of the same generator minus the first item. You can write the same algorithm in Python using lazy iterators: from itertools import izip, islice def fibonacci(): yield 1 yield 1 for nm1, n in izip(fibonacci(), islice(fibonacci(), 1, None)): yield nm1 + n fibs = islice(fibonacci(), 0, 20) print list(fibs) (I have tried to write this same program in D1 but it seems rather impossible. I think it's possible in D2 but its debugging gets too much complex for my tastes. In D I'd like a nicer yield as in Python and Chapel). Even if doesn't show, the Haskell compiler is able to optimize quite well that little program, it can generate many Fibonacci numbers in a short time, while that Python code is really slow, uses tons of RAM, and it soon blows up the stack. (Lazyness in Python is easy to use, but it was bolted on Python at the last minute and it's not always good enough). A pure generator means that it always yields the same sequence of outputs given the same input. This allows for optimizations done by Haskell compiler in that example. In theory a D compiler can do the same optimizations. pure yield(int) primes(int stop) { foreach (x; 2 .. stop) { // an all() HOF can help here bool all = true; foreach (i; 2 .. x) if (!(x % i)) { all = false; break; } if (all) yield x; } } I am quite ignorant still about such topics of lazy pure functional programming. Bye, bearophile

May 12 2010
prev sibling next sibling parent reply Pelle <pelle.mansson gmail.com> writes:
On 05/13/2010 01:46 AM, bearophile wrote:
 The meaning of that little Haskell program is simple:
 fibonacci is a symbol, it can be seen as a lazy generator that yields a list
of numbers. The first and second are 1 (the colon stands for list item
concatenation), to it concatenates the zipping between the result of the
generator and another "copy" of the same generator minus the first item.

It is stored as a list, and is in effect memoized. That's why it's a viable solution for haskell, and sucks in python. You can, naturally, create the same in D. It will be more verbose, though. ...So I did. struct Fibs { immutable(int)[] m = [0, 1]; int front() { return m[$-1]; } void popFront() { m ~= m[$-2] + m[$-1]; } enum bool empty = false; int opIndex(size_t a) { a += 1; while (a >= m.length) popFront; return m[a]; } } Fibs fibonacchi; That should be about the same. :)
May 13 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Thank you for all the answers.


Simen kjaeraas:

This has to do with D iteration being more designed towards mimicking C arrays.<

I am not able to see the link between the two things.
In D2, you'd use a range, and possibly have popFront return the new (changed)
range.<

In D2 both opApply and Range protocol can be used because they are fit for different purposes. In several situations keeping the state manually (As the Range protocol requires) requires more complex code compared to opApply.
 (x for x in xrange(2,N) if all(x % i for i in xrange(2,x)))

Except that it also changes the input.

The input there is N, and it doesn't get changed. You can see a pure generator like a pure function that returns an array of items. The main difference is that the items of such array are computed & given lazily.
It seems to me the python example uses a closure,<

In practice Python uses closures all the time :-) Somewhere that Python generator has to keep memory (state) of where the variable 'i' is.
This closure must then change its associated state upon iteration.<

It has a state it changes, but outside the generator the existance of such state can be ignored. The whole point of my post is the idea of something purish, the output of that generator is determined only by the input (N), it generates its results only once and then it stops.
Here's an implementation that seems to work:<

The code worth traslating in D2 was the Haskell Fibonacci generator. Thank you for your code, but it's too much noisy, in normal situations I am not willing to write similar code. ------------------------ Jason House:
pure is a keyword in D2 that means something different than how you're using
it.<

D2 compiler contains the idea of "pure function" and "pure method", while I am talking about "pure generator", an idea that is not present natively in the D2 compiler.
I'm also fairly ignorant about what is in std.algorithm, but you should be able
to find things that look more like your functional code examples.<

You are probably right here, but this is beside the main point of my post. ------------------------ Pelle:
It is stored as a list, and is in effect memoized.<

In practice I think Haskell doesn't memoize things there. I think it optimizes most things away. The main point of pure generators is indeed that they give the compiler enough guarantees (just as pure functions) that allow it to perform those optimizations safely. The point of my original post is that I'd like a way to give the D compiler such semantic information, mostly for optimization purposes (but there are other purposes, similarly to the purposes of pure functions).
You can, naturally, create the same in D. It will be more verbose, though.
[...] That should be about the same. :)<

Beside being not a pure generator (a semantics that currently can't be expressed to the D2 compiler), your code has very little to do with the way Haskell compiles that program. I can be wrong because I am don't know much about the Haskell compiler(s). Bye, bearophile
May 13 2010
parent Pelle <pelle.mansson gmail.com> writes:
On 05/13/2010 01:22 PM, bearophile wrote:
 Pelle:

 It is stored as a list, and is in effect memoized.<

In practice I think Haskell doesn't memoize things there. I think it optimizes most things away. The main point of pure generators is indeed that they give the compiler enough guarantees (just as pure functions) that allow it to perform those optimizations safely. The point of my original post is that I'd like a way to give the D compiler such semantic information, mostly for optimization purposes (but there are other purposes, similarly to the purposes of pure functions).

Yes, it does. Try it! Prelude> let fibs = 1 : 1 : zipWith (+) fibs (tail fibs) Prelude> fibs !! 80000 ... some time passes, a huge number appears. Prelude> fibs !! 80000 ... almost instantly, the same number. Edit the 80000 to a large enough number to slow your system down. Note that all numbers below the large number chosen is very fast as well, once the large calculation is done.
May 13 2010
prev sibling parent "Simen kjaeraas" <simen.kjaras gmail.com> writes:
bearophile <bearophileHUGS lycos.com> wrote:

 This has to do with D iteration being more designed towards mimicking C  
 arrays.<

I am not able to see the link between the two things.

An array is something you can use several times. The opApply functionality is based on this behavior (not necessarily, but the common usage, and specifically your usage in the provided code). For something more like the Python generators, look at D2 ranges. An example of those, for this specific usage, is included below.
 (x for x in xrange(2,N) if all(x % i for i in xrange(2,x)))

 Except that it also changes the input.

The input there is N, and it doesn't get changed. You can see a pure generator like a pure function that returns an array of items. The main difference is that the items of such array are computed & given lazily.

Your input to creating the generator, yes. After that, the generator is its own input. So the constructor is pure, its workings beyond that point are not. The equivalent in D would be a struct with a member function that does not change any external variables, but does change internal state. Not pure the way D has defined that word (Calling the function several times with the same input does not yield the same output, unless you're willing to call a different input with the same name the same input.).
 Here's an implementation that seems to work:<

The code worth traslating in D2 was the Haskell Fibonacci generator. Thank you for your code, but it's too much noisy, in normal situations I am not willing to write similar code.

Now, the right way to write a Fibonacci generator in D is this: import std.range; ... auto fib = recurrence!("a[n-1] + a[n-2]")(1, 1); Not as aesthetic as the equivalent Haskell code, I will admit. Also, updated code below. It's now a range, and supports much cleaner initialization and use: import core.thread; import std.range; import std.stdio; import std.array; import std.math; class YieldingFiber( T ) : Fiber { this( void function( ) fn ) { super( fn ); call( ); } this( void delegate( ) fn ) { super( fn ); call( ); } T call( ) { if ( auto o = super.call( ) ) { throw o; } return _value; } property T front( ) const { return _value; } alias call popFront; property bool empty( ) { return state == Fiber.State.TERM; } static void yield( T value ) { if ( auto p = cast( typeof( this ) )getThis( ) ) { p._value = value; } else { throw new Exception( "Invalid yield type." ); } Fiber.yield( ); } private: T _value; } void yield( T )( T value ) { YieldingFiber!( T ).yield( value ); } YieldingFiber!( T ) generator( T, U )( U fn ) { return new YieldingFiber!( T )( fn ); } // Non-reusable code from here. void primes( ) { loop: for ( int x = 2; ; ++x ) { foreach ( i; 2..ceil( sqrt( x + 0.0 ) ) ) { if ( !( x % i ) ) { continue loop; } } yield( x ); } } void main( ) { auto p = generator!( int )( &primes ); writeln( "Primes:" ); writeln( array( take( p, 20 ) ) ); auto fib = generator!int({ int a = 1, b = 1; while ( 1 ){ yield( a ); int c = a+b; a = b; b = c; } }); writeln( "Fibonacci:" ); writeln( array( take( fib, 20 ) ) ); } Please note that the YieldingFiber class and the yield and generator functions, are generic and not something one would need to reimplement each time one wants a generator. -- Simen
May 13 2010
prev sibling next sibling parent Bane <branimir.milosavljevic gmail.com> writes:
2 words (not relating to true meaning of this post):

Crap engine.

My first association seeing 'pure generators' subject. Is there better way to
counter balance them, and make a useful machine, running on type of fuel that
is in abundance here where we live? Better than perpetuum mobile.
May 13 2010
prev sibling parent reply Graham Fawcett <fawcett uwindsor.ca> writes:
Hi bearophile,

On Wed, 12 May 2010 19:46:30 -0400, bearophile wrote:

 In Haskell many "generators" are pure, this allows to do things
 like:

 fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci) fibs = take
 20 fibonacci
 main = print fibs

It's not purity that makes the this code possible in Haskell, it's lazy evaluation.
 That prints:
 [1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765]
 
 The meaning of that little Haskell program is simple: fibonacci is a
 symbol, it can be seen as a lazy generator that yields a list of
 numbers. The first and second are 1 (the colon stands for list item
 concatenation), to it concatenates the zipping between the result of the
 generator and another "copy" of the same generator minus the first item.
 
 You can write the same algorithm in Python using lazy iterators:
 
 from itertools import izip, islice
 
 def fibonacci():
     yield 1
     yield 1
     for nm1, n in izip(fibonacci(), islice(fibonacci(), 1, None)):
         yield nm1 + n
 
 fibs = islice(fibonacci(), 0, 20)
 print list(fibs)

The comparison is unfair, because those two programs don't do the same thing -- it looks like they do, but that's a surface similarity. In the Python code, 'fibonacci' is a constructor that returns a generator. The body of fibonacci() does not define a function, but rather a coroutine. Calling the fibonacci() constructor "recursively" leads to a quadratic growth of generator instances. If the stack didn't blow so quickly, the heap would soon follow. In the Haskell code, 'fibonacci' is a list instance: a lazy, infinite list. The list refers to itself: there is only one instance of it. This is not equivalent to calling a recursive function; rather you are creating a recursive, lazy data structure. Because you maintain a reference to the head of this list, the heap will be consumed -- linearly -- as the tail of list is incrementally evaluated.
 Even if doesn't show, the Haskell compiler is able to optimize quite
 well that little program, it can generate many Fibonacci numbers in a
 short time, while that Python code is really slow, uses tons of RAM, and
 it soon blows up the stack. (Lazyness in Python is easy to use, but it
 was bolted on Python at the last minute and it's not always good
 enough).

Don't blame Python; you're just doing it wrong. :) Python generators are not lazy lists, and they aren't even functions, they are coroutines. Apples, oranges, and lemons. The Fibonacci sequence can be implemented succinctly and efficiently as a Python generator: def fibonacci(): a, b = 1, 1 while True: yield a a, b = b, a + b If you use that definition in the following program, which prints the infinite sequence: for n in fibonacci(): print n ...it will run as fast as Python can add two bignums, and it will run in constant space. Graham
May 13 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Graham Fawcett:

 def fibonacci():
     a, b = 1, 1
     while True:
         yield a
         a, b = b, a + b

This is my version I use (doctests and docstring removed), a bit faster with Python 2.x: def xfibonacci(): a, b = 0, 1 while 1: yield b a += b yield a b += a Thank you for your explanations. I am too much ignorant still to talk about those things :-) Bye, bearophile
May 13 2010