digitalmars.D - pure generators

bearophile (105/108) May 12 2010 This is a post of unfocused musings, so you can ignore this post if you ...

Simen kjaeraas (62/84) May 12 2010 This has to do with D iteration being more designed towards mimicking
Jason House (3/162) May 12 2010 pure is a keyword in D2 that means something different than how you're u...
Pelle (20/22) May 13 2010 It is stored as a list, and is in effect memoized. That's why it's a

bearophile (21/32) May 13 2010 I am not able to see the link between the two things.

Simen kjaeraas (93/106) May 13 2010 An array is something you can use several times. The opApply
Pelle (10/13) May 13 2010 Yes, it does. Try it!

Bane (3/3) May 13 2010 2 words (not relating to true meaning of this post):
Graham Fawcett (33/65) May 13 2010 It's not purity that makes the this code possible in Haskell, it's

bearophile (12/17) May 13 2010 This is my version I use (doctests and docstring removed), a bit faster ...

bearophile <bearophileHUGS lycos.com> writes:

This is a post of unfocused musings, so you can ignore this post if you have
better things to do.

'generators' are something like structs that contain an opApply or the methods
of the Range protocol:


import std.stdio: write;

struct XPrimes {
    int stop;

    int opApply(int delegate(ref int) dg) {
        int result;
        foreach (x; 2 .. stop) {
            bool all = true;
            foreach (i; 2 .. x)
                if (!(x % i)) {
                    all = false;
                    break;
                }
            if (all) {
                result = dg(x);
                if (result)
                    break;
            }
        }
        return result;
    }
}

void main() {
    foreach (p; XPrimes(30))
        write(p, " ");
}



This is how you can do the same thing in Python, the code is a bit shorter:


 p = (x for x in xrange(2,30) if all(x % i for i in xrange(2,x)))
 list(p)



[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
 list(p)



[]



That code also shows that Python generators can be used only once. So using
Range speech, they are all "Input Ranges", I presume to avoid indemonstrable
assumptions about their nature that can lead to bugs.

In D the situation is different, this prints the primes two times:


import std.stdio: write, writeln;

class XPrimes {
    int stop;
    this(int s) { stop = s; }

    int opApply(int delegate(ref int) dg) {
        int result;
        foreach (x; 2 .. stop) {
            bool all = true;
            foreach (i; 2 .. x)
                if (!(x % i)) {
                    all = false;
                    break;
                }
            if (all) {
                result = dg(x);
                if (result)
                    break;
            }
        }
        return result;
    }
}

void main() {
    auto primes = new XPrimes(30);
    foreach (p; primes)
        write(p, " ");
    writeln();
    foreach (p; primes)
        write(p, " ");
}


In few more years of D learning I hope to understand why D iteration is
designed that way, and if there are problems in it.


In practice a generator like:

(x for x in xrange(2,N) if all(x % i for i in xrange(2,x)))

is kind of pure, because despite being lazy, the output it generates is only
influences by an input parameter N. So it's a Forward Range in D speech.


In theory the D Ranges can grow have a standard constant that the programmer
can add to tell if it's an Input Range or Forward Range. I don't know if this
is a good idea, the compiler can't verify that this constant value is right.

So in practice the idea of pure generators can be seen as a way (that the
compiler is able to verify) that a generator is not just an Input Range, but a
Forward Range. It can be possible to test it using similar ways used to test
that a function marked with pure is actually pure.

What can pure generators be useful for in D?

In Haskell many "generators" are pure, this allows to do things like:


fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci)
fibs = take 20 fibonacci
main = print fibs


That prints:
[1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765]


The meaning of that little Haskell program is simple:
fibonacci is a symbol, it can be seen as a lazy generator that yields a list of
numbers. The first and second are 1 (the colon stands for list item
concatenation), to it concatenates the zipping between the result of the
generator and another "copy" of the same generator minus the first item.


You can write the same algorithm in Python using lazy iterators:


from itertools import izip, islice

def fibonacci():
    yield 1
    yield 1
    for nm1, n in izip(fibonacci(), islice(fibonacci(), 1, None)):
        yield nm1 + n

fibs = islice(fibonacci(), 0, 20)
print list(fibs)


(I have tried to write this same program in D1 but it seems rather impossible.
I think it's possible in D2 but its debugging gets too much complex for my
tastes. In D I'd like a nicer yield as in Python and Chapel).

Even if doesn't show, the Haskell compiler is able to optimize quite well that
little program, it can generate many Fibonacci numbers in a short time, while
that Python code is really slow, uses tons of RAM, and it soon blows up the
stack. (Lazyness in Python is easy to use, but it was bolted on Python at the
last minute and it's not always good enough).

A pure generator means that it always yields the same sequence of outputs given
the same input. This allows for optimizations done by Haskell compiler in that
example. In theory a D compiler can do the same optimizations.


pure yield(int) primes(int stop) {
    foreach (x; 2 .. stop) {
        // an all() HOF can help here
        bool all = true;
        foreach (i; 2 .. x)
            if (!(x % i)) {
                all = false;
                break;
            }
        if (all)
            yield x;
    }
}

I am quite ignorant still about such topics of lazy pure functional programming.

Bye,
bearophile

May 12 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

bearophile <bearophileHUGS lycos.com> wrote:

 In few more years of D learning I hope to understand why D iteration is  
 designed that way, and if there are problems in it.

This has to do with D iteration being more designed towards mimicking
C arrays.
In D2, you'd use a range, and possibly have popFront return the new
(changed) range.

 In practice a generator like:

 (x for x in xrange(2,N) if all(x % i for i in xrange(2,x)))

 is kind of pure, because despite being lazy, the output it generates is  
 only influences by an input parameter N. So it's a Forward Range in D  
 speech.

Except that it also changes the input. It seems to me the python example
uses a closure, which, as Norman Adams allegedly said, "are a poor man's
objects." This closure must then change its associated state upon
iteration. (Granted, coroutines may be closer to the truth here, but I
have not quite grokked those yet, and the end result is basically the
same.)

 pure yield(int) primes(int stop) {
     foreach (x; 2 .. stop) {
         // an all() HOF can help here
         bool all = true;
         foreach (i; 2 .. x)
             if (!(x % i)) {
                 all = false;
                 break;
             }
         if (all)
             yield x;
     }
 }

Here's an implementation that seems to work:


module foo;

import core.thread;
import std.stdio;

class YieldingFiber( T ) : Fiber {
   this( void function( ) fn ) {
     super( fn );
   }

   this( void delegate( ) fn ) {
     super( fn );
   }

   T call( ) {
     if ( auto o = super.call( ) ) {
       throw o;
     }
     return _value;
   }

   static void yield( T value ) {
     if ( auto p = cast( typeof( this ) )getThis( ) ) {
       p._value = value;
     }
     Fiber.yield( );
   }
private:
   T _value;
}

void primes( ) {
   int x = 2;
   while ( true ) {
     bool all = true;
     foreach ( i; 2 .. x ) {
       if ( !( x % i ) ) {
         all = false;
         break;
       }
     }
     if ( all ) {
       YieldingFiber!( int ).yield( x );
     }
     ++x;
   }
}

void main( ) {
   auto primes = new YieldingFiber!( int )( &primes );
   foreach ( i; 0..20 ) {
     writeln( primes.call( ) );
   }
}

 I am quite ignorant still about such topics of lazy pure functional  
 programming.

Welcome to the club!

-- 
Simen

May 12 2010

Jason House <jason.james.house gmail.com> writes:

pure is a keyword in D2 that means something different than how you're using
it. I also find it unusual that you use D2 range terminology but then give
examples that don't use D ranges!

I'm also fairly ignorant about what is in std.algorithm, but you should be able
to find things that look more like your functional code examples.

bearophile Wrote:

 This is a post of unfocused musings, so you can ignore this post if you have
better things to do.
 
 'generators' are something like structs that contain an opApply or the methods
of the Range protocol:
 
 
 import std.stdio: write;
 
 struct XPrimes {
     int stop;
 
     int opApply(int delegate(ref int) dg) {
         int result;
         foreach (x; 2 .. stop) {
             bool all = true;
             foreach (i; 2 .. x)
                 if (!(x % i)) {
                     all = false;
                     break;
                 }
             if (all) {
                 result = dg(x);
                 if (result)
                     break;
             }
         }
         return result;
     }
 }
 
 void main() {
     foreach (p; XPrimes(30))
         write(p, " ");
 }
 
 
 
 This is how you can do the same thing in Python, the code is a bit shorter:
 
 
 p = (x for x in xrange(2,30) if all(x % i for i in xrange(2,x)))
 list(p)



 [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
 list(p)



 []
 
 
 
 That code also shows that Python generators can be used only once. So using
Range speech, they are all "Input Ranges", I presume to avoid indemonstrable
assumptions about their nature that can lead to bugs.
 
 In D the situation is different, this prints the primes two times:
 
 
 import std.stdio: write, writeln;
 
 class XPrimes {
     int stop;
     this(int s) { stop = s; }
 
     int opApply(int delegate(ref int) dg) {
         int result;
         foreach (x; 2 .. stop) {
             bool all = true;
             foreach (i; 2 .. x)
                 if (!(x % i)) {
                     all = false;
                     break;
                 }
             if (all) {
                 result = dg(x);
                 if (result)
                     break;
             }
         }
         return result;
     }
 }
 
 void main() {
     auto primes = new XPrimes(30);
     foreach (p; primes)
         write(p, " ");
     writeln();
     foreach (p; primes)
         write(p, " ");
 }
 
 
 In few more years of D learning I hope to understand why D iteration is
designed that way, and if there are problems in it.
 
 
 In practice a generator like:
 
 (x for x in xrange(2,N) if all(x % i for i in xrange(2,x)))
 
 is kind of pure, because despite being lazy, the output it generates is only
influences by an input parameter N. So it's a Forward Range in D speech.
 
 
 In theory the D Ranges can grow have a standard constant that the programmer
can add to tell if it's an Input Range or Forward Range. I don't know if this
is a good idea, the compiler can't verify that this constant value is right.
 
 So in practice the idea of pure generators can be seen as a way (that the
compiler is able to verify) that a generator is not just an Input Range, but a
Forward Range. It can be possible to test it using similar ways used to test
that a function marked with pure is actually pure.
 
 What can pure generators be useful for in D?
 
 In Haskell many "generators" are pure, this allows to do things like:
 
 
 fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci)
 fibs = take 20 fibonacci
 main = print fibs
 
 
 That prints:
 [1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765]
 
 
 The meaning of that little Haskell program is simple:
 fibonacci is a symbol, it can be seen as a lazy generator that yields a list
of numbers. The first and second are 1 (the colon stands for list item
concatenation), to it concatenates the zipping between the result of the
generator and another "copy" of the same generator minus the first item.
 
 
 You can write the same algorithm in Python using lazy iterators:
 
 
 from itertools import izip, islice
 
 def fibonacci():
     yield 1
     yield 1
     for nm1, n in izip(fibonacci(), islice(fibonacci(), 1, None)):
         yield nm1 + n
 
 fibs = islice(fibonacci(), 0, 20)
 print list(fibs)
 
 
 (I have tried to write this same program in D1 but it seems rather impossible.
I think it's possible in D2 but its debugging gets too much complex for my
tastes. In D I'd like a nicer yield as in Python and Chapel).
 
 Even if doesn't show, the Haskell compiler is able to optimize quite well that
little program, it can generate many Fibonacci numbers in a short time, while
that Python code is really slow, uses tons of RAM, and it soon blows up the
stack. (Lazyness in Python is easy to use, but it was bolted on Python at the
last minute and it's not always good enough).
 
 A pure generator means that it always yields the same sequence of outputs
given the same input. This allows for optimizations done by Haskell compiler in
that example. In theory a D compiler can do the same optimizations.
 
 
 pure yield(int) primes(int stop) {
     foreach (x; 2 .. stop) {
         // an all() HOF can help here
         bool all = true;
         foreach (i; 2 .. x)
             if (!(x % i)) {
                 all = false;
                 break;
             }
         if (all)
             yield x;
     }
 }
 
 I am quite ignorant still about such topics of lazy pure functional
programming.
 
 Bye,
 bearophile

May 12 2010

Pelle <pelle.mansson gmail.com> writes:

On 05/13/2010 01:46 AM, bearophile wrote:
 The meaning of that little Haskell program is simple:
 fibonacci is a symbol, it can be seen as a lazy generator that yields a list
of numbers. The first and second are 1 (the colon stands for list item
concatenation), to it concatenates the zipping between the result of the
generator and another "copy" of the same generator minus the first item.

It is stored as a list, and is in effect memoized. That's why it's a 
viable solution for haskell, and sucks in python.

You can, naturally, create the same in D. It will be more verbose, though.

...So I did.

struct Fibs {
     immutable(int)[] m = [0, 1];

     int front() { return m[$-1]; }

     void popFront() {
         m ~= m[$-2] + m[$-1];
     }
     enum bool empty = false;

     int opIndex(size_t a) {
         a += 1;
         while (a >= m.length) popFront;
         return m[a];
     }
}
Fibs fibonacchi;

That should be about the same. :)

May 13 2010

bearophile <bearophileHUGS lycos.com> writes:

Thank you for all the answers.


Simen kjaeraas:

This has to do with D iteration being more designed towards mimicking C arrays.<

I am not able to see the link between the two things.


In D2, you'd use a range, and possibly have popFront return the new (changed)
range.<

In D2 both opApply and Range protocol can be used because they are fit for
different purposes. In several situations keeping the state manually (As the
Range protocol requires) requires more complex code compared to opApply.


 (x for x in xrange(2,N) if all(x % i for i in xrange(2,x)))

Except that it also changes the input.

The input there is N, and it doesn't get changed.
You can see a pure generator like a pure function that returns an array of
items. The main difference is that the items of such array are computed & given
lazily.


It seems to me the python example uses a closure,<

In practice Python uses closures all the time :-) Somewhere that Python
generator has to keep memory (state) of where the variable 'i' is.


This closure must then change its associated state upon iteration.<

It has a state it changes, but outside the generator the existance of such
state can be ignored.
The whole point of my post is the idea of something purish, the output of that
generator is determined only by the input (N), it generates its results only
once and then it stops.


Here's an implementation that seems to work:<

The code worth traslating in D2 was the Haskell Fibonacci generator.
Thank you for your code, but it's too much noisy, in normal situations I am not
willing to write similar code.

------------------------

Jason House:

pure is a keyword in D2 that means something different than how you're using
it.<

D2 compiler contains the idea of "pure function" and "pure method", while I am
talking about "pure generator", an idea that is not present natively in the D2
compiler.


I'm also fairly ignorant about what is in std.algorithm, but you should be able
to find things that look more like your functional code examples.<

You are probably right here, but this is beside the main point of my post.

------------------------

Pelle:

It is stored as a list, and is in effect memoized.<

In practice I think Haskell doesn't memoize things there. I think it optimizes
most things away. The main point of pure generators is indeed that they give
the compiler enough guarantees (just as pure functions) that allow it to
perform those optimizations safely. The point of my original post is that I'd
like a way to give the D compiler such semantic information, mostly for
optimization purposes (but there are other purposes, similarly to the purposes
of pure functions).


You can, naturally, create the same in D. It will be more verbose, though.
[...] That should be about the same. :)<

Beside being not a pure generator (a semantics that currently can't be
expressed to the D2 compiler), your code has very little to do with the way
Haskell compiles that program. I can be wrong because I am don't know much
about the Haskell compiler(s).

Bye,
bearophile

May 13 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

bearophile <bearophileHUGS lycos.com> wrote:

 This has to do with D iteration being more designed towards mimicking C  
 arrays.<

 I am not able to see the link between the two things.

An array is something you can use several times. The opApply
functionality is based on this behavior (not necessarily, but the common
usage, and specifically your usage in the provided code).

For something more like the Python generators, look at D2 ranges. An
example of those, for this specific usage, is included below.

 (x for x in xrange(2,N) if all(x % i for i in xrange(2,x)))

 Except that it also changes the input.

 The input there is N, and it doesn't get changed.
 You can see a pure generator like a pure function that returns an array  
 of items. The main difference is that the items of such array are  
 computed & given lazily.

Your input to creating the generator, yes. After that, the generator is
its own input. So the constructor is pure, its workings beyond that
point are not.

The equivalent in D would be a struct with a member function that does
not change any external variables, but does change internal state. Not
pure the way D has defined that word (Calling the function several
times with the same input does not yield the same output, unless you're
willing to call a different input with the same name the same input.).

 Here's an implementation that seems to work:<

 The code worth traslating in D2 was the Haskell Fibonacci generator.
 Thank you for your code, but it's too much noisy, in normal situations I  
 am not willing to write similar code.

Now, the right way to write a Fibonacci generator in D is this:

import std.range;
...
auto fib = recurrence!("a[n-1] + a[n-2]")(1, 1);

Not as aesthetic as the equivalent Haskell code, I will admit.

Also, updated code below. It's now a range, and supports much cleaner
initialization and use:

import core.thread;
import std.range;
import std.stdio;
import std.array;
import std.math;

class YieldingFiber( T ) : Fiber {
   this( void function( ) fn ) {
     super( fn );
     call( );
   }

   this( void delegate( ) fn ) {
     super( fn );
     call( );
   }

   T call( ) {
     if ( auto o = super.call( ) ) {
       throw o;
     }
     return _value;
   }

    property
   T front( ) const {
     return _value;
   }

   alias call popFront;

    property
   bool empty( ) {
     return state == Fiber.State.TERM;
   }

   static void yield( T value ) {
     if ( auto p = cast( typeof( this ) )getThis( ) ) {
       p._value = value;
     } else {
       throw new Exception( "Invalid yield type." );
     }
     Fiber.yield( );
   }
private:
   T _value;
}

void yield( T )( T value ) {
   YieldingFiber!( T ).yield( value );
}

YieldingFiber!( T ) generator( T, U )( U fn ) {
   return new YieldingFiber!( T )( fn );
}
// Non-reusable code from here.
void primes( ) {
   loop:
   for ( int x = 2; ; ++x ) {
     foreach ( i; 2..ceil( sqrt( x + 0.0 ) ) ) {
       if ( !( x % i ) ) {
         continue loop;
       }
     }
     yield( x );
   }
}

void main( ) {
   auto p = generator!( int )( &primes );
   writeln( "Primes:" );
   writeln( array( take( p, 20 ) ) );
   auto fib = generator!int({ int a = 1, b = 1; while ( 1 ){ yield( a );
	int c = a+b; a = b; b = c; } });
   writeln( "Fibonacci:" );
   writeln( array( take( fib, 20 ) ) );
}

Please note that the YieldingFiber class and the yield and generator
functions, are generic and not something one would need to reimplement each
time one wants a generator.

-- 
Simen

May 13 2010

Pelle <pelle.mansson gmail.com> writes:

On 05/13/2010 01:22 PM, bearophile wrote:
 Pelle:

 It is stored as a list, and is in effect memoized.<

 In practice I think Haskell doesn't memoize things there. I think it optimizes
most things away. The main point of pure generators is indeed that they give
the compiler enough guarantees (just as pure functions) that allow it to
perform those optimizations safely. The point of my original post is that I'd
like a way to give the D compiler such semantic information, mostly for
optimization purposes (but there are other purposes, similarly to the purposes
of pure functions).

Yes, it does. Try it!

Prelude> let fibs = 1 : 1 : zipWith (+) fibs (tail fibs)
Prelude> fibs !! 80000

... some time passes, a huge number appears.

Prelude> fibs !! 80000

... almost instantly, the same number.

Edit the 80000 to a large enough number to slow your system down.

Note that all numbers below the large number chosen is very fast as 
well, once the large calculation is done.

May 13 2010

Bane <branimir.milosavljevic gmail.com> writes:

2 words (not relating to true meaning of this post):

Crap engine.

My first association seeing 'pure generators' subject. Is there better way to
counter balance them, and make a useful machine, running on type of fuel that
is in abundance here where we live? Better than perpetuum mobile.

May 13 2010

Graham Fawcett <fawcett uwindsor.ca> writes:

Hi bearophile,

On Wed, 12 May 2010 19:46:30 -0400, bearophile wrote:

 In Haskell many "generators" are pure, this allows to do things
 like:

 fibonacci = 1 : 1 : zipWith (+) fibonacci (tail fibonacci) fibs = take
 20 fibonacci
 main = print fibs

It's not purity that makes the this code possible in Haskell, it's
lazy evaluation.

 That prints:
 [1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987,1597,2584,4181,6765]
 
 The meaning of that little Haskell program is simple: fibonacci is a
 symbol, it can be seen as a lazy generator that yields a list of
 numbers. The first and second are 1 (the colon stands for list item
 concatenation), to it concatenates the zipping between the result of the
 generator and another "copy" of the same generator minus the first item.
 
 You can write the same algorithm in Python using lazy iterators:
 
 from itertools import izip, islice
 
 def fibonacci():
     yield 1
     yield 1
     for nm1, n in izip(fibonacci(), islice(fibonacci(), 1, None)):
         yield nm1 + n
 
 fibs = islice(fibonacci(), 0, 20)
 print list(fibs)

The comparison is unfair, because those two programs don't do the same
thing -- it looks like they do, but that's a surface similarity.

In the Python code, 'fibonacci' is a constructor that returns a
generator. The body of fibonacci() does not define a function, but
rather a coroutine. Calling the fibonacci() constructor "recursively"
leads to a quadratic growth of generator instances. If the stack
didn't blow so quickly, the heap would soon follow.

In the Haskell code, 'fibonacci' is a list instance: a lazy, infinite
list. The list refers to itself: there is only one instance of it.
This is not equivalent to calling a recursive function; rather you are
creating a recursive, lazy data structure. Because you maintain a
reference to the head of this list, the heap will be consumed --
linearly -- as the tail of list is incrementally evaluated.

 Even if doesn't show, the Haskell compiler is able to optimize quite
 well that little program, it can generate many Fibonacci numbers in a
 short time, while that Python code is really slow, uses tons of RAM, and
 it soon blows up the stack. (Lazyness in Python is easy to use, but it
 was bolted on Python at the last minute and it's not always good
 enough).

Don't blame Python; you're just doing it wrong. :) Python generators
are not lazy lists, and they aren't even functions, they are
coroutines. Apples, oranges, and lemons. The Fibonacci sequence can be
implemented succinctly and efficiently as a Python generator:

def fibonacci():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a + b

If you use that definition in the following program, which prints the
infinite sequence:

for n in fibonacci():
    print n

...it will run as fast as Python can add two bignums, and it will run
in constant space.

Graham

May 13 2010

bearophile <bearophileHUGS lycos.com> writes:

Graham Fawcett:

 def fibonacci():
     a, b = 1, 1
     while True:
         yield a
         a, b = b, a + b

This is my version I use (doctests and docstring removed), a bit faster with
Python 2.x:

def xfibonacci():
    a, b = 0, 1
    while 1:
        yield b
        a += b
        yield a
        b += a

Thank you for your explanations. I am too much ignorant still to talk about
those things :-)

Bye,
bearophile

May 13 2010

D Programming

C/C++ Programming

Other

digitalmars.D - pure generators