www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - lazy thoughts

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
(I originally emailed this to Walter and a couple others, but I thought 
it might be better to gather insights from the community.)

I'm gearing up for changing std.algorithm, and I am thinking of making 
the major change of favoring lazy evaluation throughout. For example, 
consider the map function. Right now map allocates and returns a new 
vector, for example:

int[] arr = [ 1, 2, 3, 4 ];
auto squares = map!("a * a")(arr);
assert(squares == [ 1, 4, 9, 16 ]);

What happens is unfortunate because (1) all evaluation is done upfront 
even if it is only partially needed, (2) allocation is done every call, 
(3) the whole scheme won't work with unbounded inputs, e.g. generators 
or even large files.

So now that we have nice range support in the language, I'm thinking 
very seriously of switching full-bore to lazy semantics. In that case 
map returns a range - a little struct that saves the input and trickles 
out results one at a time.

One nice thing about lazy is that lazy includes eager. If you actually 
want to "eagerize" map, you just call eager against the returned range:

int[] arr = [ 1, 2, 3, 4 ];
auto squares = eager(map!("a * a")(arr));
assert(squares == [ 1, 4, 9, 16 ]);

The real advantage comes when you actually exploit the laziness, e.g.:

int[] arr = [ 1, 2, 3, 4 ];
auto squares = map!("a * a")(arr);
foreach (x; squares) {
     writeln(x);
}

Look ma, no allocation!

I just wanted to gauge your opinion on this. It is a major departure 
from the STL, and I think in the right direction. It is a departure 
nonetheless and some STL users might feel uncomfortable.

Also, lazy evaluation has the risk of getting confusing as there's a lot 
of escaping. Consider:

int[] arr = [ 1, 2, 3, 4 ];
auto squares = map!("a * a")(arr);
arr[] = [ 5, 6, 7, 8 ];

Now iterating squares will see different numbers than the original ones.

Please let me know what you think!


Andrei
Jan 12 2009
next sibling parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Andrei Alexandrescu wrote:
 [snip]
I like the idea of lazy evaluation; heck, I toyed around with stack threads so I could write generators.
 Also, lazy evaluation has the risk of getting confusing as there's a lot 
 of escaping. Consider:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];
 
 Now iterating squares will see different numbers than the original ones.
 
 Please let me know what you think!
 
 Andrei
Maybe map should only take invariant data; that way, it's fairly obvious to the user that you need to pass a block of data that isn't going to be changed under the map's feet. -- Daniel
Jan 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Daniel Keep wrote:
 
 
 Andrei Alexandrescu wrote:
 [snip]
I like the idea of lazy evaluation; heck, I toyed around with stack threads so I could write generators.
 Also, lazy evaluation has the risk of getting confusing as there's a 
 lot of escaping. Consider:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];

 Now iterating squares will see different numbers than the original ones.

 Please let me know what you think!

 Andrei
Maybe map should only take invariant data; that way, it's fairly obvious to the user that you need to pass a block of data that isn't going to be changed under the map's feet.
That would reduce map's applicability considerably. Andrei
Jan 12 2009
parent reply BCS <ao pathlink.com> writes:
Reply to Andrei,

 Daniel Keep wrote:
 
 Maybe map should only take invariant data; that way, it's fairly
 obvious to the user that you need to pass a block of data that isn't
 going to be changed under the map's feet.
 
That would reduce map's applicability considerably. Andrei
maybe require an explicit for lazy if the underlying stuff isn't invariant?
Jan 12 2009
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
BCS wrote:
 Reply to Andrei,
 
 Daniel Keep wrote:

 Maybe map should only take invariant data; that way, it's fairly
 obvious to the user that you need to pass a block of data that isn't
 going to be changed under the map's feet.
That would reduce map's applicability considerably. Andrei
maybe require an explicit for lazy if the underlying stuff isn't invariant?
Perhaps have map and mapUnsafe? Or mapMutable? Honestly, I can't think of a case where this has been a problem for me with Python, for what it's worth. -- Daniel
Jan 12 2009
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 (I originally emailed this to Walter and a couple others, but I thought
 it might be better to gather insights from the community.)
 I'm gearing up for changing std.algorithm, and I am thinking of making
 the major change of favoring lazy evaluation throughout. For example,
 consider the map function. Right now map allocates and returns a new
 vector, for example:
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 assert(squares == [ 1, 4, 9, 16 ]);
 What happens is unfortunate because (1) all evaluation is done upfront
 even if it is only partially needed, (2) allocation is done every call,
 (3) the whole scheme won't work with unbounded inputs, e.g. generators
 or even large files.
 So now that we have nice range support in the language, I'm thinking
 very seriously of switching full-bore to lazy semantics. In that case
 map returns a range - a little struct that saves the input and trickles
 out results one at a time.
 One nice thing about lazy is that lazy includes eager. If you actually
 want to "eagerize" map, you just call eager against the returned range:
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = eager(map!("a * a")(arr));
 assert(squares == [ 1, 4, 9, 16 ]);
 The real advantage comes when you actually exploit the laziness, e.g.:
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 foreach (x; squares) {
      writeln(x);
 }
 Look ma, no allocation!
 I just wanted to gauge your opinion on this. It is a major departure
 from the STL, and I think in the right direction. It is a departure
 nonetheless and some STL users might feel uncomfortable.
 Also, lazy evaluation has the risk of getting confusing as there's a lot
 of escaping. Consider:
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];
 Now iterating squares will see different numbers than the original ones.
 Please let me know what you think!
 Andrei
I absolutely love it! Frankly, the convenience of std.algorithm is the greatest thing since sliced arrays, but all the memory allocation is sometimes pretty inefficient, so I end up coding lots of stuff at a lower level to avoid this. One thing, though, is that I would like to see eager() know whether whatever it's eager-izing has a predetermined length, and if so, what that predetermined length is, so it can get by with a single allocation. As far as the confusingness of lazy evaluation, it might take some getting used to, but the really hard cases could be solved by just using eager() anyhow, and we wouldn't be any worse off than if lazy weren't supported. Really, the only downside I see is slightly clumsier syntax when eager evaluation is needed. IMHO, this could be improved if eager() were, at least syntactically, a property of ranges, for example: map!("a * a")(arr).eager();
Jan 12 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
[snip]
 I absolutely love it!  Frankly, the convenience of std.algorithm is the
greatest
 thing since sliced arrays, but all the memory allocation is sometimes pretty
 inefficient, so I end up coding lots of stuff at a lower level to avoid this. 
One
 thing, though, is that I would like to see eager() know whether whatever it's
 eager-izing has a predetermined length, and if so, what that predetermined
length
 is, so it can get by with a single allocation.
Great. Fortunately that will be elegantly supported by the range design: if the range passed to map supports .length, the range returned by map will also support it (and the implementation will simply do the forwarding). Consequently, eager will detect a range that also supports .length, in which case it only does one allocation. Gotta love static if and is(expression). For years people didn't even know whether or not it's possible to detect (in C++) the existence of a member, let alone the validity of an arbitrary expression. Today detecting the existence of a member is possible but in a very inconvenient and limited manner.
 As far as the confusingness of
 lazy evaluation, it might take some getting used to, but the really hard cases
 could be solved by just using eager() anyhow, and we wouldn't be any worse off
 than if lazy weren't supported.
 
 Really, the only downside I see is slightly clumsier syntax when eager
evaluation
 is needed.  IMHO, this could be improved if eager() were, at least
syntactically,
 a property of ranges, for example:
 
 map!("a * a")(arr).eager();
How about dropping those parens :o). Andrei
Jan 12 2009
parent "Denis Koroskin" <2korden gmail.com> writes:
On Mon, 12 Jan 2009 20:32:14 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 dsimcha wrote:
 == Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s  
 article
[snip]
 I absolutely love it!  Frankly, the convenience of std.algorithm is the  
 greatest
 thing since sliced arrays, but all the memory allocation is sometimes  
 pretty
 inefficient, so I end up coding lots of stuff at a lower level to avoid  
 this.  One
 thing, though, is that I would like to see eager() know whether  
 whatever it's
 eager-izing has a predetermined length, and if so, what that  
 predetermined length
 is, so it can get by with a single allocation.
Great. Fortunately that will be elegantly supported by the range design: if the range passed to map supports .length, the range returned by map will also support it (and the implementation will simply do the forwarding). Consequently, eager will detect a range that also supports .length, in which case it only does one allocation.
Nice, but length might be not known sometimes. How about reducing a restriction of length being either known and fixed or unknown? It could be defined as a "0 if it is the range is empty, minimum number of elements within a range, otherwise". Socket stream is an example of input range that doesn't know length but has a "minimum number of elements left" property. This way you could check range.length, allocate buffer of enough size, process elements and check range.length once again. Break if it is empty. Continue otherwise: T[] eager(Generator)(Generator gen) { T[] result; while (true) { int N = range.length; if (N == 0) { break; } int lengthSoFar = result.length; int newLength = lengthSoFar + N; result.length = newLength; foreach (i; lengthSoFar..newLength) { result[i] = gen(); } } return result; }
 Gotta love static if and is(expression). For years people didn't even  
 know whether or not it's possible to detect (in C++) the existence of a  
 member, let alone the validity of an arbitrary expression. Today  
 detecting the existence of a member is possible but in a very  
 inconvenient and limited manner.

 As far as the confusingness of
 lazy evaluation, it might take some getting used to, but the really  
 hard cases
 could be solved by just using eager() anyhow, and we wouldn't be any  
 worse off
 than if lazy weren't supported.
  Really, the only downside I see is slightly clumsier syntax when eager  
 evaluation
 is needed.  IMHO, this could be improved if eager() were, at least  
 syntactically,
 a property of ranges, for example:
  map!("a * a")(arr).eager();
How about dropping those parens :o). Andrei
No way!
Jan 12 2009
prev sibling parent BCS <ao pathlink.com> writes:
Reply to dsimcha,

 One thing, though, is that I would like to
 see eager() know whether whatever it's eager-izing has a predetermined
 length, and if so, what that predetermined length is, so it can get by
 with a single allocation. 
It would be realy cool if this could be, where posible, static, then eager could under some cases use NRVO and get no calls to malloc at all.
Jan 12 2009
prev sibling next sibling parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Mon, 12 Jan 2009 09:05:18 -0800, Andrei Alexandrescu wrote:

 Also, lazy evaluation has the risk of getting confusing as there's a lot 
 of escaping. Consider:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];
 
 Now iterating squares will see different numbers than the original ones.
This part bothers me. Though it's hard to tell how much danger it would impose in practice.
Jan 12 2009
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Sergey Gromov (snake.scaly gmail.com)'s article
 Mon, 12 Jan 2009 09:05:18 -0800, Andrei Alexandrescu wrote:
 Also, lazy evaluation has the risk of getting confusing as there's a lot
 of escaping. Consider:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];

 Now iterating squares will see different numbers than the original ones.
This part bothers me. Though it's hard to tell how much danger it would impose in practice.
I don't understand how this is any worse/more confusing than any other case of having multiple aliases to the same memory region.
Jan 12 2009
parent reply Sergey Gromov <snake.scaly gmail.com> writes:
Mon, 12 Jan 2009 17:38:40 +0000 (UTC), dsimcha wrote:

 == Quote from Sergey Gromov (snake.scaly gmail.com)'s article
 Mon, 12 Jan 2009 09:05:18 -0800, Andrei Alexandrescu wrote:
 Also, lazy evaluation has the risk of getting confusing as there's a lot
 of escaping. Consider:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];

 Now iterating squares will see different numbers than the original ones.
This part bothers me. Though it's hard to tell how much danger it would impose in practice.
I don't understand how this is any worse/more confusing than any other case of having multiple aliases to the same memory region.
First, it's an obscure alias. Second, it wasn't so in the previous implementation of std.algorithm. OTOH these are mitigated by the fact you're going to re-visit the map!() usage in your code anyway. Also I've got an impression that Phobos tried to avoid dangerous aliasing as much as possible by using immutable argiments in aliasing algorithms. map() obviously refuses to follow this rule. On an unrelated note, I probably should make a post about Phobos overusing immutable arguments. This is a direct consequence of declaring string as an alias of immutable(char)[] which is IMHO plain wrong.
Jan 12 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Sergey Gromov wrote:
 Mon, 12 Jan 2009 17:38:40 +0000 (UTC), dsimcha wrote:
 
 == Quote from Sergey Gromov (snake.scaly gmail.com)'s article
 Mon, 12 Jan 2009 09:05:18 -0800, Andrei Alexandrescu wrote:
 Also, lazy evaluation has the risk of getting confusing as there's a lot
 of escaping. Consider:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];

 Now iterating squares will see different numbers than the original ones.
This part bothers me. Though it's hard to tell how much danger it would impose in practice.
I don't understand how this is any worse/more confusing than any other case of having multiple aliases to the same memory region.
First, it's an obscure alias. Second, it wasn't so in the previous implementation of std.algorithm. OTOH these are mitigated by the fact you're going to re-visit the map!() usage in your code anyway.
Correct. Besides, laziness enables programming patterns and styles that are very useful. For example, after the overhaul of std.algorithm, a plethora of generators will come forth.
 Also I've got an impression that Phobos tried to avoid dangerous
 aliasing as much as possible by using immutable argiments in aliasing
 algorithms.  map() obviously refuses to follow this rule.
Well it doesn't "refuse" as that suggests confrontation. It simply acknowledges that oftentimes map's functionality is useful over mutable inputs.
 On an unrelated note, I probably should make a post about Phobos
 overusing immutable arguments.  This is a direct consequence of
 declaring string as an alias of immutable(char)[] which is IMHO plain
 wrong.
This shapes up as a basic disagreement, but to a large extent we can work things out. For example, a generic string function such as startsWith(someString, prefix) should not require invariant inputs when it could work just as well for mutable arrays. Then everybody is happy. Andrei
Jan 12 2009
prev sibling parent Robert Fraser <fraserofthenight gmail.com> writes:
Sergey Gromov wrote:
 Mon, 12 Jan 2009 09:05:18 -0800, Andrei Alexandrescu wrote:
 
 Also, lazy evaluation has the risk of getting confusing as there's a lot 
 of escaping. Consider:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];

 Now iterating squares will see different numbers than the original ones.
This part bothers me. Though it's hard to tell how much danger it would impose in practice.
I could also see it being useful in some situations. But, yes, it is unexpected.
Jan 12 2009
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 I'm gearing up for changing std.algorithm, and I am thinking of making 
 the major change of favoring lazy evaluation throughout. For example, 
 consider the map function. Right now map allocates and returns a new 
 vector, for example:
[...]
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = eager(map!("a * a")(arr));
 assert(squares == [ 1, 4, 9, 16 ]);
As I have told you in the past, it's good for D to embrace more lazy operations, so I agree with this. For example from the string module of my dlibs you can see how many times xsplit() is faster than split(). In my dlibs I have map/xmap, filter/xfilter, etc, etc. The x denotes the lazy versions. So I keep both versions, and I think this is the good. In my libs the eager() is named array() that I think is a better name, and I presume will lead to less typos. The implementation of array() is quite reliable and efficient, so you may take a look at it. Bye, bearophile
Jan 12 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
You may also want to take a look at xchan() and the template mixin Chainable:

http://www.fantascienza.net/leonardo/so/dlibs/func.html

(And there's lot of other lazy stuff there that you may appreciate.)

Bye,
bearophile
Jan 12 2009
prev sibling parent Robert Fraser <fraserofthenight gmail.com> writes:
bearophile wrote:
 In my libs the eager() is named array() that I think is a better name, and I
presume will lead to less typos.
I like "force," as in (force (delay (map ((lambda (x) (* x x)) (arr)))) (I probably got that wrong; been a while...)
Jan 12 2009
prev sibling next sibling parent BCS <ao pathlink.com> writes:
Reply to Andrei,

 One nice thing about lazy is that lazy includes eager. If you actually
 want to "eagerize" map, you just call eager against the returned
 range:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = eager(map!("a * a")(arr));
 assert(squares == [ 1, 4, 9, 16 ]);
Nice, that covers my major concern as long as it's not doing a lazy eval for each member even so.Based on that I'd think making it a property would be better. For some cases, might the computational overhead be the high cost bit rather than the alloc? In that cases you might want to have a memoize option as well.
Jan 12 2009
prev sibling next sibling parent reply Brian Palmer <d brian.codekitchen.net> writes:
Andrei Alexandrescu Wrote:

 (I originally emailed this to Walter and a couple others, but I thought 
 it might be better to gather insights from the community.)
 
 I'm gearing up for changing std.algorithm, and I am thinking of making 
 the major change of favoring lazy evaluation throughout. (snip)
I am behind this change 100%. Awesome that range support enabled this functionality. I haven't used std.algorithm in a few months so I apologize if I'm out of date here, but you might want to start thinking about stream fusion too -- "fusing" multiple std.algorithm calls (map, fold, etc.) together so that you can pipeline computations and only ever allocate to get the final result. Then with range support in streams, we could read in a stream, apply multiple transformations via std.algorithm, and then write to another stream all without extra allocation for intermediate results!
Jan 12 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Brian Palmer wrote:
 Andrei Alexandrescu Wrote:
 
 (I originally emailed this to Walter and a couple others, but I thought 
 it might be better to gather insights from the community.)

 I'm gearing up for changing std.algorithm, and I am thinking of making 
 the major change of favoring lazy evaluation throughout. (snip)
I am behind this change 100%. Awesome that range support enabled this functionality. I haven't used std.algorithm in a few months so I apologize if I'm out of date here, but you might want to start thinking about stream fusion too -- "fusing" multiple std.algorithm calls (map, fold, etc.) together so that you can pipeline computations and only ever allocate to get the final result. Then with range support in streams, we could read in a stream, apply multiple transformations via std.algorithm, and then write to another stream all without extra allocation for intermediate results!
Good point. Yes, the overhaul of std.algorithm in wake of streams will be accompanied by an overhaul of std.stdio (work on files as streams), std.random (random generators as streams), std.format (output formatted objects straight to streams) and probably a couple of others. Andrei
Jan 12 2009
prev sibling next sibling parent Max Samukha <samukha voliacable.com.removethis> writes:
On Mon, 12 Jan 2009 09:05:18 -0800, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

(I originally emailed this to Walter and a couple others, but I thought 
it might be better to gather insights from the community.)

I'm gearing up for changing std.algorithm, and I am thinking of making 
the major change of favoring lazy evaluation throughout. For example, 
consider the map function. Right now map allocates and returns a new 
vector, for example:

int[] arr = [ 1, 2, 3, 4 ];
auto squares = map!("a * a")(arr);
assert(squares == [ 1, 4, 9, 16 ]);

What happens is unfortunate because (1) all evaluation is done upfront 
even if it is only partially needed, (2) allocation is done every call, 
(3) the whole scheme won't work with unbounded inputs, e.g. generators 
or even large files.

So now that we have nice range support in the language, I'm thinking 
very seriously of switching full-bore to lazy semantics. In that case 
map returns a range - a little struct that saves the input and trickles 
out results one at a time.

One nice thing about lazy is that lazy includes eager. If you actually 
want to "eagerize" map, you just call eager against the returned range:

int[] arr = [ 1, 2, 3, 4 ];
auto squares = eager(map!("a * a")(arr));
assert(squares == [ 1, 4, 9, 16 ]);

The real advantage comes when you actually exploit the laziness, e.g.:

int[] arr = [ 1, 2, 3, 4 ];
auto squares = map!("a * a")(arr);
foreach (x; squares) {
     writeln(x);
}

Look ma, no allocation!

I just wanted to gauge your opinion on this. It is a major departure 
from the STL, and I think in the right direction. It is a departure 
nonetheless and some STL users might feel uncomfortable.

Also, lazy evaluation has the risk of getting confusing as there's a lot 
of escaping. Consider:

int[] arr = [ 1, 2, 3, 4 ];
auto squares = map!("a * a")(arr);
arr[] = [ 5, 6, 7, 8 ];

Now iterating squares will see different numbers than the original ones.

Please let me know what you think!


Andrei
That's just great. If the ranges design is settled, could you please release an updated RFC?
Jan 12 2009
prev sibling next sibling parent reply noobie <a b.com> writes:
Andrei Alexandrescu Wrote:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];
 
 Now iterating squares will see different numbers than the original ones.
Okay, what is the problem in maintaining a reference to the original array values?
Jan 12 2009
next sibling parent BCS <ao pathlink.com> writes:
Reply to noobie,

 Andrei Alexandrescu Wrote:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];
 Now iterating squares will see different numbers than the original
 ones.
 
Okay, what is the problem in maintaining a reference to the original array values?
that is the problem. "arr[]" is a content copy not a reference copy.
Jan 12 2009
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
noobie wrote:
 Andrei Alexandrescu Wrote:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];

 Now iterating squares will see different numbers than the original ones.
Okay, what is the problem in maintaining a reference to the original array values?
Efficiency. You'd have to copy the whole range, and in fact deep copy it. Andrei
Jan 12 2009
prev sibling next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Tue, Jan 13, 2009 at 2:05 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 (I originally emailed this to Walter and a couple others, but I thought it
 might be better to gather insights from the community.)

 I'm gearing up for changing std.algorithm, and I am thinking of making the
 major change of favoring lazy evaluation throughout. For example, consider
 the map function. Right now map allocates and returns a new vector, for
 example:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 assert(squares == [ 1, 4, 9, 16 ]);

 What happens is unfortunate because (1) all evaluation is done upfront even
 if it is only partially needed, (2) allocation is done every call, (3) the
 whole scheme won't work with unbounded inputs, e.g. generators or even large
 files.

 So now that we have nice range support in the language, I'm thinking very
 seriously of switching full-bore to lazy semantics. In that case map returns
 a range - a little struct that saves the input and trickles out results one
 at a time.

 One nice thing about lazy is that lazy includes eager. If you actually want
 to "eagerize" map, you just call eager against the returned range:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = eager(map!("a * a")(arr));
 assert(squares == [ 1, 4, 9, 16 ]);

 The real advantage comes when you actually exploit the laziness, e.g.:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 foreach (x; squares) {
    writeln(x);
 }

 Look ma, no allocation!

 I just wanted to gauge your opinion on this. It is a major departure from
 the STL, and I think in the right direction. It is a departure nonetheless
 and some STL users might feel uncomfortable.

 Also, lazy evaluation has the risk of getting confusing as there's a lot of
 escaping. Consider:

 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];

 Now iterating squares will see different numbers than the original ones.

 Please let me know what you think!
I think having such functions is great. I'm not so sure about removing the others and replacing them wholesale with these. In Python they have had two versions of most functions, like range vs xrange. The former returns an array with the result, the latter returns a lazy iterable. In the Python 3.0 they have done away with most of the x__ versions and made those default. So you might say A-ha! the array versions weren't necessary! But the difference is that Python cares more about simplicity than performance, so a little loss in speed in exchange for only-one-way-to-do-it is acceptable there. But I'm just guessing there will be some penalty for doing everything lazily in the case where you know you want a full array. More data is needed on that I think. But even assuming it's is a free lunch, I'd want a better way make an array than putting .eager on everything. First off, that's not even remotely a verb (like .eval() or .exec()), nor is it a noun that clearly expresses the type it will become (ala array() or toArray()). Second it's just a lot of typing for something that could be a pretty common case, still. I'd be happier with a one letter difference, like Python had. But making the defaults the other way would be fine by me, map -> lazy map emap -> eager map. --bb
Jan 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 On Tue, Jan 13, 2009 at 2:05 AM, Andrei Alexandrescu
 [snip]
 Please let me know what you think!
I think having such functions is great. I'm not so sure about removing the others and replacing them wholesale with these. In Python they have had two versions of most functions, like range vs xrange. The former returns an array with the result, the latter returns a lazy iterable. In the Python 3.0 they have done away with most of the x__ versions and made those default. So you might say A-ha! the array versions weren't necessary! But the difference is that Python cares more about simplicity than performance, so a little loss in speed in exchange for only-one-way-to-do-it is acceptable there.
Here it's not as much about efficiency as about orthogonality. We know how to (lazy) map, we know how to force a computation, so why not allow people to mix and match them (as opposed to providing one-liners in the standard library).
 But I'm just guessing there will be some penalty for doing everything
 lazily in the case where you know you want a full array.  More data is
 needed on that I think.
 
 But even assuming it's is a free lunch, I'd want a better way make an
 array than putting .eager on everything.  First off, that's not even
 remotely a verb (like .eval() or .exec()), nor is it a noun that
 clearly expresses the type it will become (ala array() or toArray()).
I believe (without having measured... which means that I am essentially lying) that we can safely assume the lunch will be free or low cost. The copying of the underlying range should be cheap except for the shortest ranges.
 Second it's just a lot of typing for something that could be a pretty
 common case, still.   I'd be happier with a one letter difference,
 like Python had.  But making the defaults the other way would be fine
 by me,   map -> lazy map   emap -> eager map.
Yah, I'll look into it. One thing is that not that many functions will end up being lazy. For example, reduce is eager (I think it's uninteresting to have it default to laziness). So will be sorting and searching functions. In fact map and filter are the poster children of lazy evaluation in today's std.algorithm. Of course there will be other lazy functions now that the door is open, such as Haskell's "take" etc. Andrei
Jan 12 2009
next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Tue, Jan 13, 2009 at 6:42 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Bill Baxter wrote:
 On Tue, Jan 13, 2009 at 2:05 AM, Andrei Alexandrescu
 [snip]

 Please let me know what you think!
I think having such functions is great. I'm not so sure about removing the others and replacing them wholesale with these. In Python they have had two versions of most functions, like range vs xrange. The former returns an array with the result, the latter returns a lazy iterable. In the Python 3.0 they have done away with most of the x__ versions and made those default. So you might say A-ha! the array versions weren't necessary! But the difference is that Python cares more about simplicity than performance, so a little loss in speed in exchange for only-one-way-to-do-it is acceptable there.
Here it's not as much about efficiency as about orthogonality. We know how to (lazy) map, we know how to force a computation, so why not allow people to mix and match them (as opposed to providing one-liners in the standard library).
 But I'm just guessing there will be some penalty for doing everything
 lazily in the case where you know you want a full array.  More data is
 needed on that I think.

 But even assuming it's is a free lunch, I'd want a better way make an
 array than putting .eager on everything.  First off, that's not even
 remotely a verb (like .eval() or .exec()), nor is it a noun that
 clearly expresses the type it will become (ala array() or toArray()).
I believe (without having measured... which means that I am essentially lying) that we can safely assume the lunch will be free or low cost. The copying of the underlying range should be cheap except for the shortest ranges.
How about the loss of ability to inline the transformation function? I don't know if your current eager implementation can inline or not, but it could be a performance factor. But I would certainly like for you to be right about it being a free or very cheap lunch.
 Second it's just a lot of typing for something that could be a pretty
 common case, still.   I'd be happier with a one letter difference,
 like Python had.  But making the defaults the other way would be fine
 by me,   map -> lazy map   emap -> eager map.
Yah, I'll look into it. One thing is that not that many functions will end up being lazy. For example, reduce is eager (I think it's uninteresting to have it default to laziness). So will be sorting and searching functions. In fact map and filter are the poster children of lazy evaluation in today's std.algorithm. Of course there will be other lazy functions now that the door is open, such as Haskell's "take" etc.
Another commonly used func from Python is zip(). Not sure if that one is doable in D because it relies heavily on Python's tuples to work, but in Python it offers a very clean solution to iterating over several lists at once. for xy in zip([1,2,3],[4,5,6]): # xy gets sequence of tuples (1,4), (2,5), (3,6) for x,y in zip([1,2,3],[4,5,6]): # tuple split into x and y automaticallly enumerate() is another one, enumerate(myarray) is basically zip(range(0,myarray.length), myarray). This eliminates the need for different overloads of opApply that differ only by whether or not they provide a counter. Also I really hope we'll be seeing lazy versions of the associative array properties .keys and .values! --bb
Jan 12 2009
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Bill Baxter wrote:
 Second it's just a lot of typing for something that could be a pretty
 common case, still.   I'd be happier with a one letter difference,
 like Python had.  But making the defaults the other way would be fine
 by me,   map -> lazy map   emap -> eager map.
Yah, I'll look into it. One thing is that not that many functions will end up being lazy. For example, reduce is eager (I think it's uninteresting to have it default to laziness). So will be sorting and searching functions. In fact map and filter are the poster children of lazy evaluation in today's std.algorithm. Of course there will be other lazy functions now that the door is open, such as Haskell's "take" etc.
Another commonly used func from Python is zip(). Not sure if that one is doable in D because it relies heavily on Python's tuples to work, but in Python it offers a very clean solution to iterating over several lists at once. for xy in zip([1,2,3],[4,5,6]): # xy gets sequence of tuples (1,4), (2,5), (3,6) for x,y in zip([1,2,3],[4,5,6]): # tuple split into x and y automaticallly
zip() is absolutely on the list.
 enumerate() is another one, enumerate(myarray)  is basically
 zip(range(0,myarray.length), myarray).  This eliminates the need for
 different overloads of opApply that differ only by whether or not they
 provide a counter.
That's a great function too!
 Also I really hope we'll be seeing lazy versions of the associative
 array properties .keys and .values!
And that has been a thorn in a thorn-averse part of my anatomy for the longest time. Andrei
Jan 12 2009
parent reply Daniel Keep <daniel.keep.lists gmail.com> writes:
Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 [snip]

 Another commonly used func from Python is zip().  Not sure if that one
 is doable in D because it relies heavily on Python's tuples to work,
 but in Python it offers a very clean solution to iterating over
 several lists at once.
      for xy in zip([1,2,3],[4,5,6]):
         # xy gets sequence of tuples (1,4), (2,5), (3,6)

      for x,y in zip([1,2,3],[4,5,6]):
         # tuple split into x and y automaticallly
zip() is absolutely on the list.
How are you going to implement zip? I tried writing a version ages ago, and the issue I ran into is that you cannot support any user-defined iterables with it. The problem is that opApply uses inversion of control to do its work, so you can't run more than one opApply in lockstep unless you're using some form of threading. In a more general note, this is all shaping up to be incredibly awesome; keep up the good work! :D -- Daniel
Jan 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Daniel Keep wrote:
 Andrei Alexandrescu wrote:
 Bill Baxter wrote:
 [snip]

 Another commonly used func from Python is zip().  Not sure if that one
 is doable in D because it relies heavily on Python's tuples to work,
 but in Python it offers a very clean solution to iterating over
 several lists at once.
      for xy in zip([1,2,3],[4,5,6]):
         # xy gets sequence of tuples (1,4), (2,5), (3,6)

      for x,y in zip([1,2,3],[4,5,6]):
         # tuple split into x and y automaticallly
zip() is absolutely on the list.
How are you going to implement zip? I tried writing a version ages ago, and the issue I ran into is that you cannot support any user-defined iterables with it. The problem is that opApply uses inversion of control to do its work, so you can't run more than one opApply in lockstep unless you're using some form of threading.
opApply should be first dipped in tar, then dipped in feathers, and finally walked around town in a wooden cage. I think it's the least of all D in the D spirit. Abstractions that look cute and pretend there's no cost => ~D. zip will be implemented with ranges and std.typecons.Tuple.
 In a more general note, this is all shaping up to be incredibly awesome;
 keep up the good work! :D
Thanks. Now I need to find the time to make nice on the promise... lest I get tarred and feathered myself :o). Andrei
Jan 12 2009
parent Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu wrote:

 opApply should be first dipped in tar, then dipped in feathers, and
 finally walked around town in a wooden cage. I think it's the least of
 all D in the D spirit. Abstractions that look cute and pretend there's
 no cost => ~D.
I like opApply. From a lazy container writer's viewpoint, it's increadibly easy to implement (we just need to get rid of the int return hack). I'll definitely agree that it can be limited, but it can be very easy and simple. I really hope ranges can obtain that degree of simplicity. I guess time will tell :)
Jan 12 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Bill Baxter:

 I believe (without having measured... which means that I am essentially
 lying) that we can safely assume the lunch will be free or low cost. The
 copying of the underlying range should be cheap except for the shortest
 ranges.
Using array(xmap()) in my dlibs calls the opApply of the xmap, this is measurably slower than using map(), so I keep both. I keep both also because it's handy to have both.
 Another commonly used func from Python is zip().  Not sure if that one
 is doable in D because it relies heavily on Python's tuples to work,
 but in Python it offers a very clean solution to iterating over
 several lists at once.
Take a look at azip, zip and xzip in my dlibs.
 Also I really hope we'll be seeing lazy versions of the associative
 array properties .keys and .values!
Take a look at xkeys and xvalues of my dlibs. My interest for this community and for D is decreasing quickly. Bye, bearophile
Jan 12 2009
next sibling parent reply Jason House <jason.james.house gmail.com> writes:
bearophile wrote:

 My interest for this community and for D is decreasing quickly.
This may be a stupid question, but why is that?
Jan 12 2009
parent reply John Reimer <terminal.node gmail.com> writes:
Hello Jason,

 bearophile wrote:
 
 My interest for this community and for D is decreasing quickly.
 
This may be a stupid question, but why is that?
I'll hazard a guess that it's because some ideas that have originated from the community don't always get recognition by the D team. Later, when the D team independently engineers similar ideas, it is likely to be exasperating to the community members that promoted them in the first place (and worked hard to implement them too). This is just the way D Language development works, and I'm hoping the effect is unintentional. Unfortunately, the "snub" effect will never disappear as long as this community refuses to realize that D development is not driven by the community; the community, as of now, mostly operates as a highly tuned sounding board. By now many people here probably understand the chosen mode of development, but maybe some don't. It's not optimal... it's just the way it is. Incidentally, this is one of the reasons there is still a divergence between community and team developed libraries (ie Tango and Phobos). Er... I suppose I've mentioned this before... :P -JJR
Jan 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
John Reimer wrote:
 Hello Jason,
 
 bearophile wrote:

 My interest for this community and for D is decreasing quickly.
This may be a stupid question, but why is that?
I'll hazard a guess that it's because some ideas that have originated from the community don't always get recognition by the D team. Later, when the D team independently engineers similar ideas, it is likely to be exasperating to the community members that promoted them in the first place (and worked hard to implement them too).
This is a tad imprecise. I'll wait for bearophile to tell if he feels he hasn't gotten the credit he believes he deserves before I answer this particular point. At any rate, I'll be glad if anyone who does feel not having been credited appropriately for contributions to D could use this forum to tell about it. I'm big on properly assigning credit where credit is due. But I also feel I can't give credit for something I know does not reflect the process involved, just so they're not unhappy.
 This is just the way D Language development works, and I'm hoping the 
 effect is unintentional.  Unfortunately, the "snub" effect will never 
 disappear as long as this community refuses to realize that D 
 development is not driven by the community; the community, as of now, 
 mostly operates as a highly tuned sounding board.  By now many people 
 here probably understand the chosen mode of development, but maybe some 
 don't.  It's not optimal... it's just the way it is.  Incidentally, this 
 is one of the reasons there is still a divergence between community and 
 team developed libraries (ie Tango and Phobos).
 
 Er... I suppose I've mentioned this before... :P
Many people, when they acquire an idea, understand and internalize it to the point where they essentially reinvent it. Sometimes I noticed this happens to Walter - he hears something without understanding it, thinks it through, and comes back with it after having truly rediscovered it. I've had to ask him explicitly to grant me credit in a few instances, one of I remember right now being the scope statements. There are a few others that I even know it's useless to ask credit for... such as typelists which got mis-baptized as tuples, foam at my mouth notwithstanding :o). Therefore, I'm sure many people who must also be feeling Walter (or Bartosz or myself) should have credited properly, but weren't friends enough with him/them/us to casually ask for it. This can only be frustrating, so by all means do tell about it. What I'm trying to say is that every person has this and that little quirk, and that Walter is one of the most honest and ethical persons I have ever met. So if there's any odd snubbing effect or whatnot, that is not intentional and could be easily corrected. Andrei
Jan 12 2009
next sibling parent John Reimer <terminal.node gmail.com> writes:
Hello Andrei,

 John Reimer wrote:
 
 Hello Jason,
 
 bearophile wrote:
 
 My interest for this community and for D is decreasing quickly.
 
This may be a stupid question, but why is that?
I'll hazard a guess that it's because some ideas that have originated from the community don't always get recognition by the D team. Later, when the D team independently engineers similar ideas, it is likely to be exasperating to the community members that promoted them in the first place (and worked hard to implement them too).
This is a tad imprecise. I'll wait for bearophile to tell if he feels he hasn't gotten the credit he believes he deserves before I answer this particular point. At any rate, I'll be glad if anyone who does feel not having been credited appropriately for contributions to D could use this forum to tell about it. I'm big on properly assigning credit where credit is due. But I also feel I can't give credit for something I know does not reflect the process involved, just so they're not unhappy.
 This is just the way D Language development works, and I'm hoping the
 effect is unintentional.  Unfortunately, the "snub" effect will never
 disappear as long as this community refuses to realize that D
 development is not driven by the community; the community, as of now,
 mostly operates as a highly tuned sounding board.  By now many people
 here probably understand the chosen mode of development, but maybe
 some don't.  It's not optimal... it's just the way it is.
 Incidentally, this is one of the reasons there is still a divergence
 between community and team developed libraries (ie Tango and Phobos).
 
 Er... I suppose I've mentioned this before... :P
 
Many people, when they acquire an idea, understand and internalize it to the point where they essentially reinvent it. Sometimes I noticed this happens to Walter - he hears something without understanding it, thinks it through, and comes back with it after having truly rediscovered it. I've had to ask him explicitly to grant me credit in a few instances, one of I remember right now being the scope statements. There are a few others that I even know it's useless to ask credit for... such as typelists which got mis-baptized as tuples, foam at my mouth notwithstanding :o). Therefore, I'm sure many people who must also be feeling Walter (or Bartosz or myself) should have credited properly, but weren't friends enough with him/them/us to casually ask for it. This can only be frustrating, so by all means do tell about it. What I'm trying to say is that every person has this and that little quirk, and that Walter is one of the most honest and ethical persons I have ever met. So if there's any odd snubbing effect or whatnot, that is not intentional and could be easily corrected. Andrei
Argh... I think I wrote that poorly. I didn't mean to say that you, Walter, or Bartosz were adopting ideas from the community and taking credit for it. I meant to say that, sometimes, the community will come up with an idea idependently (perhaps prior to you gents), and it appears to get unrecognized amid the posts: the effect is that these ideas/implementations seem ignored. Then you guys end up sometimes /independently/ working on similar ideas and implementing them on your own. That's where the other fella can get exasperated. :) Granted it doesn't happen all the time... However, concerning the dynamics of D community verses D team development of D language design, I stand my ground. It's just the way it is. :) -JJR
Jan 12 2009
prev sibling next sibling parent John Reimer <terminal.node gmail.com> writes:
Hello Andrei,

 John Reimer wrote:
 
 Hello Jason,
 
 bearophile wrote:
 
 My interest for this community and for D is decreasing quickly.
 
This may be a stupid question, but why is that?
I'll hazard a guess that it's because some ideas that have originated from the community don't always get recognition by the D team. Later, when the D team independently engineers similar ideas, it is likely to be exasperating to the community members that promoted them in the first place (and worked hard to implement them too).
This is a tad imprecise.
In retrospect, you are right in that I shouldn't have jumped the gun with my hazardly guess. My apologies. It would have been better to let bearophile explain the problem himself. -JJR
Jan 12 2009
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:
 I'll wait for bearophile to tell if he feels he 
 hasn't gotten the credit he believes he deserves before I answer this 
 particular point.
I'm having a bad week for matters unrelated to D. You are doing lot of work for D, so don't worry for me. I was just a bit sad seeing you inventing some things I use often and I have shown here few times. Consider the idea of having both lazy/eager versions of your functors. Consider the idea of having a functionality like the xchain functor and Chainable class mixin of my libs (that is lazy chaining of arbitrary iterables, eager and lazy) using ~. I have also created various other things that you may re-invent in the following days and weeks, like the Xcast, a way to cast lazily the items of an iterable, etc. One small but important thing I think has to be fixed in D2 is related to associative arrays in D: iteration has to happen first of all on keys. This gives big practical advantages. In my libs all functions when confronted with AAs use their keys first. In my dlibs I have implemented several other things that I hope to see in D2, like the record/Record that is a much better Tuple, a better equality testing among arrays, testing and comparison among associative arrays, etc. I have explained such things few times, but I am willing to explain them again, if there's someone willing to listen. Bye, bearophile
Jan 13 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
bearophile wrote:
 Andrei Alexandrescu:
 I'll wait for bearophile to tell if he feels he hasn't gotten the
 credit he believes he deserves before I answer this particular
 point.
I'm having a bad week for matters unrelated to D. You are doing lot of work for D, so don't worry for me. I was just a bit sad seeing you inventing some things I use often and I have shown here few times. Consider the idea of having both lazy/eager versions of your functors. Consider the idea of having a functionality like the xchain functor and Chainable class mixin of my libs (that is lazy chaining of arbitrary iterables, eager and lazy) using ~. I have also created various other things that you may re-invent in the following days and weeks, like the Xcast, a way to cast lazily the items of an iterable, etc. One small but important thing I think has to be fixed in D2 is related to associative arrays in D: iteration has to happen first of all on keys. This gives big practical advantages. In my libs all functions when confronted with AAs use their keys first. In my dlibs I have implemented several other things that I hope to see in D2, like the record/Record that is a much better Tuple, a better equality testing among arrays, testing and comparison among associative arrays, etc. I have explained such things few times, but I am willing to explain them again, if there's someone willing to listen.
I understand. It would be great to solve this to everybody's satisfaction, and I think it's entirely possible. First off, allow me a rather general comment stemming from experience. The problem with little ideas like that is exactly that they are little - they are often hard to give credit for, and they may occur independently to many people. Over time I've shared many such little ideas with the C++ community in various venues, and I've often found them used without credit. The true solution is simply to let that go and hunt for bigger ideas. Bigger ideas have more impact, have larger uses, and are less likely to occur to others independently. Second, you'd hopefully give me that I'd heard of the benefits of lazy evaluation before having joined this group. The reason today's std.algorithm forgoes lazy evaluation is that iterating lazy structures is either very cumbersome (proxies) or very inefficient (opApply). Cumbersome abstractions will hardly become popular, and I hold costly abstractions in utter contempt. They are a dime a dozen and putting them straight in the core language and/or standard library would surely hamstring everything using them in actuality or in spirit. (Alan Perlis: "Lisp programmers know the value of everything and the cost of nothing." Well that has changed in modern Lisp because the prevalence of macros has grown appropriately, but it's still funny.) That's why the real problems to solve were things like figuring out an efficient way to define higher-order functions (via alias usage versus slow delegate usage), a solid iteration abstraction, and rendering opApply obsolete. That was the hard problem, not implementing higher-order functions or lazy algorithms. Once the solution was in place, the door is open for all sorts of cool stuff, and here's where you can add beneficial influence. This brings me to a third issue. If you want to make dlibs popular, talking about it on the newsgroup is a very inefficient way. Many discussions come and go on the newsgroup. I seem to remember that for the longest time you only provided your library as a downloadable zip file (with html documentation *inside* the zip). I want you to sit for a minute and ponder about the obtuseness of such a setup. It's almost like you wanted to just make a token acknowledgment of your own work, while keeping it in fact thoroughly obscure. Then, after finally having put the library online (at my behest as far as I remember), its online documentation could stand significant improvement and examples. You could do so with a fraction of the effort you spend on this newsgroup e.g. explaining your putr function (which, as expected, came and went and is now largely forgotten). If I were you I'd probably add some nice examples to the documentation and put a link to it in my .sig. Then, whenever you want to make a point, you can simply refer people to links to your documentation and examples, instead of sitting down and writing rationale and examples every time the discussion comes close. Andrei
Jan 13 2009
parent reply Jason House <jason.james.house gmail.com> writes:
Andrei Alexandrescu Wrote:

 bearophile wrote:
 Andrei Alexandrescu:
 I'll wait for bearophile to tell if he feels he hasn't gotten the
 credit he believes he deserves before I answer this particular
 point.
I'm having a bad week for matters unrelated to D. You are doing lot of work for D, so don't worry for me. I was just a bit sad seeing you inventing some things I use often and I have shown here few times. Consider the idea of having both lazy/eager versions of your functors. Consider the idea of having a functionality like the xchain functor and Chainable class mixin of my libs (that is lazy chaining of arbitrary iterables, eager and lazy) using ~. I have also created various other things that you may re-invent in the following days and weeks, like the Xcast, a way to cast lazily the items of an iterable, etc. One small but important thing I think has to be fixed in D2 is related to associative arrays in D: iteration has to happen first of all on keys. This gives big practical advantages. In my libs all functions when confronted with AAs use their keys first. In my dlibs I have implemented several other things that I hope to see in D2, like the record/Record that is a much better Tuple, a better equality testing among arrays, testing and comparison among associative arrays, etc. I have explained such things few times, but I am willing to explain them again, if there's someone willing to listen.
I understand. It would be great to solve this to everybody's satisfaction, and I think it's entirely possible. First off, allow me a rather general comment stemming from experience. The problem with little ideas like that is exactly that they are little - they are often hard to give credit for, and they may occur independently to many people. Over time I've shared many such little ideas with the C++ community in various venues, and I've often found them used without credit. The true solution is simply to let that go and hunt for bigger ideas. Bigger ideas have more impact, have larger uses, and are less likely to occur to others independently. Second, you'd hopefully give me that I'd heard of the benefits of lazy evaluation before having joined this group. The reason today's std.algorithm forgoes lazy evaluation is that iterating lazy structures is either very cumbersome (proxies) or very inefficient (opApply). Cumbersome abstractions will hardly become popular, and I hold costly abstractions in utter contempt. They are a dime a dozen and putting them straight in the core language and/or standard library would surely hamstring everything using them in actuality or in spirit. (Alan Perlis: "Lisp programmers know the value of everything and the cost of nothing." Well that has changed in modern Lisp because the prevalence of macros has grown appropriately, but it's still funny.) That's why the real problems to solve were things like figuring out an efficient way to define higher-order functions (via alias usage versus slow delegate usage), a solid iteration abstraction, and rendering opApply obsolete. That was the hard problem, not implementing higher-order functions or lazy algorithms. Once the solution was in place, the door is open for all sorts of cool stuff, and here's where you can add beneficial influence. This brings me to a third issue. If you want to make dlibs popular, talking about it on the newsgroup is a very inefficient way. Many discussions come and go on the newsgroup. I seem to remember that for the longest time you only provided your library as a downloadable zip file (with html documentation *inside* the zip). I want you to sit for a minute and ponder about the obtuseness of such a setup. It's almost like you wanted to just make a token acknowledgment of your own work, while keeping it in fact thoroughly obscure. Then, after finally having put the library online (at my behest as far as I remember), its online documentation could stand significant improvement and examples. You could do so with a fraction of the effort you spend on this newsgroup e.g. explaining your putr function (which, as expected, came and went and is now largely forgotten). If I were you I'd probably add some nice examples to the documentation and put a link to it in my .sig. Then, whenever you want to make a point, you can simply refer people to links to your documentation and examples, instead of sitting down and writing rationale and examples every time the discussion comes close. Andrei
Point #3 is on the mark. A URL to quality documentstion is worth 100 posts declaring the superiority of dlibs.
Jan 13 2009
parent reply Bill Baxter <wbaxter gmail.com> writes:
On Wed, Jan 14, 2009 at 7:39 AM, Jason House
<jason.james.house gmail.com> wrote:

 Point #3 is on the mark. A URL to quality documentstion is worth 100 posts
declaring the superiority of dlibs.
A URL to browseable source wouldn't hurt either. Given how much you promote your library here, it's surprising to me that you (bearophile) don't at least have a dsource/googlecode/sourceforge project for it where one could go to find out more. Usually when you mention it you don't give any link at all. You just say "my libs". I haven't tried, but I suspect googling for "my libs" isn't going to lead me to your code. --bb
Jan 13 2009
parent reply davidl <davidl 126.com> writes:
在 Wed, 14 Jan 2009 07:43:53 +0800,Bill Baxter <wbaxter gmail.com> 写道:

 On Wed, Jan 14, 2009 at 7:39 AM, Jason House
 <jason.james.house gmail.com> wrote:

 Point #3 is on the mark. A URL to quality documentstion is worth 100  
 posts declaring the superiority of dlibs.
A URL to browseable source wouldn't hurt either. Given how much you promote your library here, it's surprising to me that you (bearophile) don't at least have a dsource/googlecode/sourceforge project for it where one could go to find out more. Usually when you mention it you don't give any link at all. You just say "my libs". I haven't tried, but I suspect googling for "my libs" isn't going to lead me to your code. --bb
That's an unfair accusation. I think he posted: http://www.fantascienza.net/leonardo/so/dlibs
Jan 14 2009
next sibling parent reply Bill Baxter <wbaxter gmail.com> writes:
On Thu, Jan 15, 2009 at 4:58 PM, davidl <davidl 126.com> wrote:
 $B:_(B Wed, 14 Jan 2009 07:43:53 +0800$B!$(BBill Baxter
<wbaxter gmail.com> $B<LF;(B:

 On Wed, Jan 14, 2009 at 7:39 AM, Jason House
 <jason.james.house gmail.com> wrote:

 Point #3 is on the mark. A URL to quality documentstion is worth 100
 posts declaring the superiority of dlibs.
A URL to browseable source wouldn't hurt either. Given how much you promote your library here, it's surprising to me that you (bearophile) don't at least have a dsource/googlecode/sourceforge project for it where one could go to find out more. Usually when you mention it you don't give any link at all. You just say "my libs". I haven't tried, but I suspect googling for "my libs" isn't going to lead me to your code. --bb
That's an unfair accusation. I think he posted: http://www.fantascienza.net/leonardo/so/dlibs
Hmm. Gives me 403 forbidden. I think you're right that he has posted a URL before. But the point is still, if you're going to say "you should take a look at this" you need to make it easy to find "this", or nobody's going to go look at it. *Every time* I mention Multiarray (http://www.dsource.org/projects/multiarray) or some other project I'm involved with, I provide a link. If you actually want people to look at something that's what you gotta do. Just common sense, really. Especially when the name for the thing is something generic that Google probably won't work on. --bb
Jan 15 2009
parent Aarti_pl <aarti interia.pl> writes:
Bill Baxter pisze:
 On Thu, Jan 15, 2009 at 4:58 PM, davidl <davidl 126.com> wrote:
 $B:_(B Wed, 14 Jan 2009 07:43:53 +0800$B!$(BBill Baxter
<wbaxter gmail.com> $B<LF;(B:

 On Wed, Jan 14, 2009 at 7:39 AM, Jason House
 <jason.james.house gmail.com> wrote:

 Point #3 is on the mark. A URL to quality documentstion is worth 100
 posts declaring the superiority of dlibs.
A URL to browseable source wouldn't hurt either. Given how much you promote your library here, it's surprising to me that you (bearophile) don't at least have a dsource/googlecode/sourceforge project for it where one could go to find out more. Usually when you mention it you don't give any link at all. You just say "my libs". I haven't tried, but I suspect googling for "my libs" isn't going to lead me to your code. --bb
That's an unfair accusation. I think he posted: http://www.fantascienza.net/leonardo/so/dlibs
Hmm. Gives me 403 forbidden. I think you're right that he has posted a URL before. But the point is still, if you're going to say "you should take a look at this" you need to make it easy to find "this", or nobody's going to go look at it. *Every time* I mention Multiarray (http://www.dsource.org/projects/multiarray) or some other project I'm involved with, I provide a link. If you actually want people to look at something that's what you gotta do. Just common sense, really. Especially when the name for the thing is something generic that Google probably won't work on. --bb
I agree, I also usually provide link every time I mention my libs. Try this link - it works: http://www.fantascienza.net/leonardo/so BR Marcin Kuszczak (aarti_pl)
Jan 15 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
davidl:
 I think he posted:
 http://www.fantascienza.net/leonardo/so/dlibs
I think you meant this: http://www.fantascienza.net/leonardo/so/dlibs/all.html Bye, bearophile
Jan 15 2009
prev sibling parent Georg Wrede <georg nospam.org> writes:
On Mon, 2009-01-12 at 21:32 -0800, Andrei Alexandrescu wrote:


 I'll hazard a guess that it's because some ideas that have originated 
 from the community don't always get recognition by the D team.  Later, 
 when the D team independently engineers similar ideas, it is likely to 
 be exasperating to the community members that promoted them in the first 
 place (and worked hard to implement them too).
...
 Many people, when they acquire an idea, understand and internalize it to 
 the point where they essentially reinvent it. Sometimes I noticed this 
 happens to Walter - he hears something without understanding it, thinks 
 it through, and comes back with it after having truly rediscovered it. 
 I've had to ask him explicitly to grant me credit in a few instances, 
 one of I remember right now being the scope statements. There are a few 
 others that I even know it's useless to ask credit for... such as 
 typelists which got mis-baptized as tuples, foam at my mouth 
 notwithstanding :o). Therefore, I'm sure many people who must also be 
 feeling Walter (or Bartosz or myself) should have credited properly, but 
 weren't friends enough with him/them/us to casually ask for it. This can 
 only be frustrating, so by all means do tell about it.
 
 What I'm trying to say is that every person has this and that little 
 quirk, and that Walter is one of the most honest and ethical persons I 
 have ever met. So if there's any odd snubbing effect or whatnot, that is 
 not intentional and could be easily corrected.
That seems to be somehow universal. I've helped two separate persons, at different times, who were trying to decide to which part of town to move. On both occasions I spent considerable time explaining and advertising where they should move. After several months they both moved to where I suggested, and then, a few years later both literally believed they had come up with the idea themselves. One of them actually got upset when I reminded her that it was actually thanks to me that she now lived where she did. Another example was a technician at a company that imported Commodore-64 computers. The company had to set up some kind of repair shop for the warranty repairs, and they had no equipment for digital diagnostics, at all. Usually the symptom was "it doesn't start". The guy was heavily into electronics, he had even built a high-end stereo amplifier. But he had no experience with digital electronics. I showed him that you can take an AM radio, tune it between stations, and then listen to the radio. I taught him how he can hear the difference between a CPU problem, bad ROM, bad RAM, or a bad display chip. It took a while, but then he got it. Then, 20 years later, when we met by chance, he bragged to me about how he had invented a way to know what's wrong with those computers with only a radio. And he got upset when I reminded him of my version of the story. (I originally got the idea when I had bought my first HP-25 programmable calculator. I happened to be listening to Radio Luxembourg, which was the only radio station in the '70s that played good music in Europe. I could "hear" what the calculator was doing! Later, with my VIC-20, I used to listen to the computer "think", just for fun.) All these people, genuinely and honestly believed they had come up with the stuff themselves. It's happened to me, too, here on this very newsgroup. I have many examples. Lest folks get angry, I will not list the things here, but that's all on the record, if one wants to wade through the previous years of posts. It happens all the time, both here and around us. Of course we all should strive to give credit where credit is due, but I don't really believe people should start to cry every time "their idea" gets reinvented. That's just life. And we should be here to further D, not to gather personal merit or fame.
Jan 15 2009
prev sibling parent Bill Baxter <wbaxter gmail.com> writes:
On Tue, Jan 13, 2009 at 10:47 AM, bearophile <bearophileHUGS lycos.com> wrote:
 Bill Baxter:

 I believe (without having measured... which means that I am essentially
 lying) that we can safely assume the lunch will be free or low cost. The
 copying of the underlying range should be cheap except for the shortest
 ranges.
Using array(xmap()) in my dlibs calls the opApply of the xmap, this is measurably slower than using map(), so I keep both. I keep both also because it's handy to have both.
 Another commonly used func from Python is zip().  Not sure if that one
 is doable in D because it relies heavily on Python's tuples to work,
 but in Python it offers a very clean solution to iterating over
 several lists at once.
Take a look at azip, zip and xzip in my dlibs.
 Also I really hope we'll be seeing lazy versions of the associative
 array properties .keys and .values!
Take a look at xkeys and xvalues of my dlibs. My interest for this community and for D is decreasing quickly.
It should be the opposite, no? It's looking like we're finally getting a way to implement those things you've had in your libs *efficiently* in D, without having to go through the slow opApply pathway. I would think you'd be jumping up and down for joy, not leaving the community. --bb
Jan 12 2009
prev sibling next sibling parent reply Brian Palmer <d brian.codekitchen.net> writes:
Andrei Alexandrescu Wrote:

 [snip snip]
 Yah, I'll look into it. One thing is that not that many functions will 
 end up being lazy. For example, reduce is eager (I think it's 
 uninteresting to have it default to laziness).
Actually if I can pick a nit real quick here, I disagree that a lazy reduce isn't interesting, there's a lot of very interesting algorithms based off of lazy folding, and lazy folding of infinite lists. After all, the fold is the "mother" of all those other iterators like map, filter, etc. In any case, it sounds like implementing a lazy reduce should be simple enough even if it's not in std.algorithm itself, so I'm not set on it. -- Brian
Jan 12 2009
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Brian Palmer wrote:
 Andrei Alexandrescu Wrote:
 
 [snip snip] Yah, I'll look into it. One thing is that not that many
 functions will end up being lazy. For example, reduce is eager (I
 think it's uninteresting to have it default to laziness).
Actually if I can pick a nit real quick here, I disagree that a lazy reduce isn't interesting, there's a lot of very interesting algorithms based off of lazy folding, and lazy folding of infinite lists. After all, the fold is the "mother" of all those other iterators like map, filter, etc. In any case, it sounds like implementing a lazy reduce should be simple enough even if it's not in std.algorithm itself, so I'm not set on it.
Exactly. Lazy reduce is easy to implement e.g. by means of a delegate. On the other hand, lazy map is unimplementable from eager map. Andrei
Jan 12 2009
prev sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
news:gkgdfl$2too$1 digitalmars.com...
 Bill Baxter wrote:
 But I'm just guessing there will be some penalty for doing everything
 lazily in the case where you know you want a full array.  More data is
 needed on that I think.

 But even assuming it's is a free lunch, I'd want a better way make an
 array than putting .eager on everything.  First off, that's not even
 remotely a verb (like .eval() or .exec()), nor is it a noun that
 clearly expresses the type it will become (ala array() or toArray()).
I believe (without having measured... which means that I am essentially lying) that we can safely assume the lunch will be free or low cost. The copying of the underlying range should be cheap except for the shortest ranges.
Could it get in the way of parallelizing map?
Jan 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Nick Sabalausky wrote:
 "Andrei Alexandrescu" <SeeWebsiteForEmail erdani.org> wrote in message 
 news:gkgdfl$2too$1 digitalmars.com...
 Bill Baxter wrote:
 But I'm just guessing there will be some penalty for doing everything
 lazily in the case where you know you want a full array.  More data is
 needed on that I think.

 But even assuming it's is a free lunch, I'd want a better way make an
 array than putting .eager on everything.  First off, that's not even
 remotely a verb (like .eval() or .exec()), nor is it a noun that
 clearly expresses the type it will become (ala array() or toArray()).
I believe (without having measured... which means that I am essentially lying) that we can safely assume the lunch will be free or low cost. The copying of the underlying range should be cheap except for the shortest ranges.
Could it get in the way of parallelizing map?
Dunno. According to SPJ, automatically parallelizing map was a failed experiment in Haskell. Explicit parallelizing a la pmap seems to be the way to go. Andrei
Jan 12 2009
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Andrei Alexandrescu wrote:
 Dunno. According to SPJ, automatically parallelizing map was a failed 
 experiment in Haskell. Explicit parallelizing a la pmap seems to be the 
 way to go.
Source? I think as processors grow in number, automatic paralellization will become increasingly important, so I'm wondering why it was a "failed experiment"?
Jan 12 2009
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Robert Fraser wrote:
 Andrei Alexandrescu wrote:
 Dunno. According to SPJ, automatically parallelizing map was a failed 
 experiment in Haskell. Explicit parallelizing a la pmap seems to be 
 the way to go.
Source? I think as processors grow in number, automatic paralellization will become increasingly important, so I'm wondering why it was a "failed experiment"?
Private conversation. Andrei
Jan 12 2009
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Andrei Alexandrescu wrote:
 Robert Fraser wrote:
 Andrei Alexandrescu wrote:
 Dunno. According to SPJ, automatically parallelizing map was a failed 
 experiment in Haskell. Explicit parallelizing a la pmap seems to be 
 the way to go.
Source? I think as processors grow in number, automatic paralellization will become increasingly important, so I'm wondering why it was a "failed experiment"?
Private conversation. Andrei
Did he mention what the implementation was like and/or what the problem was (i.e. too much overhead, etc.)?
Jan 13 2009
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2009-01-13 10:27:10 +0100, Robert Fraser <fraserofthenight gmail.com> said:

 Andrei Alexandrescu wrote:
 Robert Fraser wrote:
 Andrei Alexandrescu wrote:
 Dunno. According to SPJ, automatically parallelizing map was a failed 
 experiment in Haskell. Explicit parallelizing a la pmap seems to be the 
 way to go.
Source? I think as processors grow in number, automatic paralellization will become increasingly important, so I'm wondering why it was a "failed experiment"?
Private conversation. Andrei
Did he mention what the implementation was like and/or what the problem was (i.e. too much overhead, etc.)?
Probably something of the following: 1) In general when you do map you cannot exclude that the user expects sequentiality, or at least not parallelism. 2) if you use forward iterators (and lazy lists are such) are *very* sequential. By the way now that you are going more in the lazy direction you might want to consider again the problem with pure function not accepting lazy structures (invariant structures have to be fully evaluated). I had brought this up quite some time ago... Fawzi
Jan 13 2009
parent reply Robert Fraser <fraserofthenight gmail.com> writes:
Fawzi Mohamed wrote:
 Probably something of the following:
 1) In general when you do map you cannot exclude that the user expects 
 sequentiality, or at least not parallelism.
 
 2) if you use forward iterators (and lazy lists are such) are *very* 
 sequential.
Yes, but Haskell is pure-functional.
Jan 13 2009
parent Bill Baxter <wbaxter gmail.com> writes:
On Wed, Jan 14, 2009 at 6:39 AM, Robert Fraser
<fraserofthenight gmail.com> wrote:
 Fawzi Mohamed wrote:
 Probably something of the following:
 1) In general when you do map you cannot exclude that the user expects
 sequentiality, or at least not parallelism.

 2) if you use forward iterators (and lazy lists are such) are *very*
 sequential.
Yes, but Haskell is pure-functional.
Sure but in a lazy list the value of element n+1 can depend on any or all of elements 0..n. Like a lazy list of Fibonnocci numbers, there's not much you can do to automatically parallelize that. Just a guess though. I have no idea what the problem was. --bb
Jan 13 2009
prev sibling next sibling parent Jason House <jason.james.house gmail.com> writes:
It sounds good.  I have one question: 
  How do we avoid surprising the user?

That can be...
  lazy output when expecting non-lazy
  non-lazy output when expecting lazy
  later input changes altering output

When returning lazy should be be safe:
  immutable input data
  consumable ranges
  the output range can be of the same type as the input ranges

Confusing / easy to misuse cases:
  Arrays of mutable data
  Slices of static arrays (or any other scope data)
  Changing the default behavior depending on the function

Some candidate ideas:
  Create std.algorithm and std.lazyalgorithm
   * Allows user to pick what they want (when importing only one)
   * Overload sets can help with ambiguous cases
  Slight name mangling of lazy and non-lazy versions (map vs. xmap)


Andrei Alexandrescu wrote:

 (I originally emailed this to Walter and a couple others, but I thought
 it might be better to gather insights from the community.)
 
 I'm gearing up for changing std.algorithm, and I am thinking of making
 the major change of favoring lazy evaluation throughout. For example,
 consider the map function. Right now map allocates and returns a new
 vector, for example:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 assert(squares == [ 1, 4, 9, 16 ]);
 
 What happens is unfortunate because (1) all evaluation is done upfront
 even if it is only partially needed, (2) allocation is done every call,
 (3) the whole scheme won't work with unbounded inputs, e.g. generators
 or even large files.
 
 So now that we have nice range support in the language, I'm thinking
 very seriously of switching full-bore to lazy semantics. In that case
 map returns a range - a little struct that saves the input and trickles
 out results one at a time.
 
 One nice thing about lazy is that lazy includes eager. If you actually
 want to "eagerize" map, you just call eager against the returned range:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = eager(map!("a * a")(arr));
 assert(squares == [ 1, 4, 9, 16 ]);
 
 The real advantage comes when you actually exploit the laziness, e.g.:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 foreach (x; squares) {
      writeln(x);
 }
 
 Look ma, no allocation!
 
 I just wanted to gauge your opinion on this. It is a major departure
 from the STL, and I think in the right direction. It is a departure
 nonetheless and some STL users might feel uncomfortable.
 
 Also, lazy evaluation has the risk of getting confusing as there's a lot
 of escaping. Consider:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];
 
 Now iterating squares will see different numbers than the original ones.
 
 Please let me know what you think!
 
 
 Andrei
Jan 12 2009
prev sibling parent noobie <a b.com> writes:
Andrei Alexandrescu Wrote:

 noobie wrote:
 Andrei Alexandrescu Wrote:
 
 int[] arr = [ 1, 2, 3, 4 ];
 auto squares = map!("a * a")(arr);
 arr[] = [ 5, 6, 7, 8 ];

 Now iterating squares will see different numbers than the original ones.
Okay, what is the problem in maintaining a reference to the original array values?
Efficiency. You'd have to copy the whole range, and in fact deep copy it. Andrei
Why can't it be done like copy-on-write? So all such functions would be lazy by default. Maybe on seeing code that modifies original values, a copy is created? Also with the current proposal I think its still better to provide two version than simply give the lazy func and .eager . Using standard lib functions should involve very little typing and the names make the code more legible. Thanks.
Jan 13 2009