digitalmars.D - Dynamic Closure + Lazy Arguments = Performance Killer?

Jason House (4/4) Oct 24 2008 I ported some monte carlo simulation code from Java to D2, and performan...

Gregor Richards (4/11) Oct 24 2008 Java has a much better garbage collector than D, as it doesn't need to

Jason House (3/16) Oct 24 2008 The code is written to explicitly avoid memory allocation, especially in...

Frank Benoit (6/19) Oct 24 2008 It was written in this NG over and over. The D2 full closure feature is

Bill Baxter (12/31) Oct 24 2008 Not to mention that, among the top problems plaguing D2 currently, it
Robert Fraser (2/23) Oct 24 2008 Agreed. Is it in bugzilla?

bearophile (9/12) Oct 24 2008 Can you show us a working minimal code I/we can test?

Jason House (6/38) Oct 24 2008 The following spends 90% of its time in _d_alloc_memory

bearophile (9/14) Oct 24 2008 I see. So I presume it becomes quite difficult for D2 to compute up to t...

Jason House (2/20) Oct 25 2008 I would assume a fix would be to add scope to input delegates and to req...

Bill Baxter (16/33) Oct 25 2008 This makes no sense because the writer of bar has no idea whether the

Jason House (2/41) Oct 25 2008 While I agree that should be the default, I've already seen plenty of D1...
Lars Ivar Igesund (10/48) Oct 25 2008 I agree that D1 behaviour should be the default, since otherwise it'll b...

Denis Koroskin (10/56) Oct 25 2008 I believe the default should be the one that is most frequently used, ev...

Lars Ivar Igesund (7/69) Oct 25 2008 I definately agree with this.
Jarrett Billingsley (3/8) Oct 25 2008 Fantastic. That also neatly solves the "returning a delegate"

Frits van Bommel (24/32) Oct 25 2008 How would this work? For example:

Denis Koroskin (30/54) Oct 25 2008 Good question! First, let's expand the code:
Jarrett Billingsley (10/20) Oct 25 2008 Clarification - it would be an error to return a scope delegate from

Steven Schveighoffer (58/117) Oct 26 2008 I've been thinking about this solution, and I think the decision to allo...

Bill Baxter (48/103) Oct 26 2008 Ok, so that's a good example where only the caller knows that heap

Andrei Alexandrescu (4/17) Oct 24 2008 std.random does not use dynamic memory allocation. Walter is almost done...

Bill Baxter (8/25) Oct 24 2008 Well the suggestion is that it may be using dynamic memory allocation

Andrei Alexandrescu (4/29) Oct 24 2008 I forgot.

Jason House (2/3) Oct 24 2008 Lazy arguments are delegates, and enforce uses lazy arguments

Andrei Alexandrescu (3/8) Oct 24 2008 Yikes, I see.

Jason House (6/24) Oct 24 2008 This is exactly why so many have complained about the dynamic closure

Jason House (2/21) Oct 28 2008 I was wrong about the 4x thing. I have bad hardware. After fixing the ac...

Steven Schveighoffer (5/38) Oct 28 2008 When you say 'fixing the accidental allocation' you mean removing the ca...

Jason House (2/45) Oct 29 2008 Yes. You are right. The allocation of the dynamic closure was the only p...

Russell Lewis (26/26) Nov 03 2008 Objective 1: Make the heap vs. stack variables explicit

Jason House <jason.james.house gmail.com> writes:

I ported some monte carlo simulation code from Java to D2, and performance is
horrible.

34% of the execution time is used by std.random.uniform. To my great surprise,
25% of the execution  time is memory allocation (and collection) from that
random call. The only candidate source I see is a call to ensure with lazy
arguments. The memory allocation occurs at the start of the UniformDistribution
call. I assume this is dynamic closure kicking in.

Can anyone verify that this is the case?

600000 memory allocations per second really kills performance!

Oct 24 2008

Gregor Richards <Richards codu.org> writes:

Jason House wrote:
 I ported some monte carlo simulation code from Java to D2, and performance is
horrible.
 
 34% of the execution time is used by std.random.uniform. To my great surprise,
25% of the execution  time is memory allocation (and collection) from that
random call. The only candidate source I see is a call to ensure with lazy
arguments. The memory allocation occurs at the start of the UniformDistribution
call. I assume this is dynamic closure kicking in.
 
 Can anyone verify that this is the case?
 
 600000 memory allocations per second really kills performance!

Java has a much better garbage collector than D, as it doesn't need to 
be conservative.

  - Gregor Richards

Oct 24 2008

Jason House <jason.james.house gmail.com> writes:

Gregor Richards Wrote:

 Jason House wrote:
 I ported some monte carlo simulation code from Java to D2, and performance is
horrible.
 
 34% of the execution time is used by std.random.uniform. To my great surprise,
25% of the execution  time is memory allocation (and collection) from that
random call. The only candidate source I see is a call to ensure with lazy
arguments. The memory allocation occurs at the start of the UniformDistribution
call. I assume this is dynamic closure kicking in.
 
 Can anyone verify that this is the case?
 
 600000 memory allocations per second really kills performance!

 
 Java has a much better garbage collector than D, as it doesn't need to 
 be conservative.
 
   - Gregor Richards

The code is written to explicitly avoid memory allocation, especially in tight
loops. Without this dynamic closure, the garbage collecor would never run. This
case is especially pathetic since the call to ensure will never trigger. 

This is part of a mini language shootout. The Java version I cloned runs 4x
faster. This is only one piece of a much bigger problem.

Oct 24 2008

Frank Benoit <keinfarbton googlemail.com> writes:

Jason House schrieb:
 I ported some monte carlo simulation code from Java to D2, and
 performance is horrible.
 
 34% of the execution time is used by std.random.uniform. To my great
 surprise, 25% of the execution  time is memory allocation (and
 collection) from that random call. The only candidate source I see is
 a call to ensure with lazy arguments. The memory allocation occurs at
 the start of the UniformDistribution call. I assume this is dynamic
 closure kicking in.
 
 Can anyone verify that this is the case?
 
 600000 memory allocations per second really kills performance!

It was written in this NG over and over. The D2 full closure feature is
a BIG!!!! problem.

The nested functions passed as callback are an important and performance
technique in D. The D2 full closure "feature" effectively removes it and
makes D2 less attractive IMHO.

Oct 24 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Sat, Oct 25, 2008 at 7:23 AM, Frank Benoit
<keinfarbton googlemail.com> wrote:
 Jason House schrieb:
 I ported some monte carlo simulation code from Java to D2, and
 performance is horrible.

 34% of the execution time is used by std.random.uniform. To my great
 surprise, 25% of the execution  time is memory allocation (and
 collection) from that random call. The only candidate source I see is
 a call to ensure with lazy arguments. The memory allocation occurs at
 the start of the UniformDistribution call. I assume this is dynamic
 closure kicking in.

 Can anyone verify that this is the case?

 600000 memory allocations per second really kills performance!

 It was written in this NG over and over. The D2 full closure feature is
 a BIG!!!! problem.

 The nested functions passed as callback are an important and performance
 technique in D. The D2 full closure "feature" effectively removes it and
 makes D2 less attractive IMHO.

Not to mention that, among the top problems plaguing D2 currently, it
should be one of the easier things to fix.  Far easier than figuring
out overhauls for operator overloading, or construction syntax, or how
ranges should work, or forward reference, or figuring out how 'shared'
should work, or merging Tango and Phobos.   Compared to those it's
pretty easy to solve this one!

Personally I think no alloc should be the default, with different
syntax to get a full closure.  Using the "new" keyword somehow makes
sense to me.

--bb

Oct 24 2008

Robert Fraser <fraserofthenight gmail.com> writes:

Frank Benoit wrote:
 Jason House schrieb:
 I ported some monte carlo simulation code from Java to D2, and
 performance is horrible.

 34% of the execution time is used by std.random.uniform. To my great
 surprise, 25% of the execution  time is memory allocation (and
 collection) from that random call. The only candidate source I see is
 a call to ensure with lazy arguments. The memory allocation occurs at
 the start of the UniformDistribution call. I assume this is dynamic
 closure kicking in.

 Can anyone verify that this is the case?

 600000 memory allocations per second really kills performance!

 
 It was written in this NG over and over. The D2 full closure feature is
 a BIG!!!! problem.
 
 The nested functions passed as callback are an important and performance
 technique in D. The D2 full closure "feature" effectively removes it and
 makes D2 less attractive IMHO.

Agreed. Is it in bugzilla?

Oct 24 2008

bearophile <bearophileHUGS lycos.com> writes:

Jason House:

 34% of the execution time is used by std.random.uniform.

Kiss of Tango is much faster, and there's a much faster still (but good still)
rnd generator around...


 Can anyone verify that this is the case?

Can you show us a working minimal code I/we can test?

-------------------

Frank Benoit:

The D2 full closure "feature" effectively removes it and makes D2 less
attractive IMHO.<

The first simple solution is to add the possibility of adding "scope" to
closures to not use the heap (but I don't know how to do that in every
situation, and it makes the already long syntax of lambdas even longer).

But the probably best way for D to become more functional (and normal
functional programming is often full of functions that move everywhere, often
they are closures, but only virtually) is to grow some more optimizing
capabilities, so closures aren't a problem anymore. There are many ways to
perform such optimizations (but in a language mostly based on side effects it's
less easy).

Bye,
bearophile

Oct 24 2008

Jason House <jason.james.house gmail.com> writes:

bearophile wrote:

 Jason House:
 
 34% of the execution time is used by std.random.uniform.

 
 Kiss of Tango is much faster, and there's a much faster still (but good
 still) rnd generator around...
 
 
 Can anyone verify that this is the case?

 
 Can you show us a working minimal code I/we can test?
 
 -------------------
 
 Frank Benoit:
 
The D2 full closure "feature" effectively removes it and makes D2 less
attractive IMHO.<

 
 The first simple solution is to add the possibility of adding "scope" to
 closures to not use the heap (but I don't know how to do that in every
 situation, and it makes the already long syntax of lambdas even longer).
 
 But the probably best way for D to become more functional (and normal
 functional programming is often full of functions that move everywhere,
 often they are closures, but only virtually) is to grow some more
 optimizing capabilities, so closures aren't a problem anymore. There are
 many ways to perform such optimizations (but in a language mostly based on
 side effects it's less easy).
 
 Bye,
 bearophile

The following spends 90% of its time in _d_alloc_memory
void bar(lazy int i){}
void foo(int i){ bar(i); }
void main(){ foreach(int i; 1..1000000) foo(i); }

Compiling with -O -release reduces it to 88% :)

Oct 24 2008

bearophile <bearophileHUGS lycos.com> writes:

Jason House:
 The following spends 90% of its time in _d_alloc_memory
 void bar(lazy int i){}
 void foo(int i){ bar(i); }
 void main(){ foreach(int i; 1..1000000) foo(i); }
 Compiling with -O -release reduces it to 88% :)

I see. So I presume it becomes quite difficult for D2 to compute up to the 25th
term of this sequence (the D code is in the middle of the page) (it takes just
few seconds to run on D1):
http://en.wikipedia.org/wiki/Man_or_boy_test

What syntax can we use to avoid heap allocation? Few ideas:

void bar(lazy int i){} // like D1
void bar(scope lazy int i){} // like D1
void bar(closure int i){} // like current D2

Bye,
bearophile

Oct 24 2008

Jason House <jason.james.house gmail.com> writes:

bearophile Wrote:

 Jason House:
 The following spends 90% of its time in _d_alloc_memory
 void bar(lazy int i){}
 void foo(int i){ bar(i); }
 void main(){ foreach(int i; 1..1000000) foo(i); }
 Compiling with -O -release reduces it to 88% :)

 
 I see. So I presume it becomes quite difficult for D2 to compute up to the
25th term of this sequence (the D code is in the middle of the page) (it takes
just few seconds to run on D1):
 http://en.wikipedia.org/wiki/Man_or_boy_test
 
 What syntax can we use to avoid heap allocation? Few ideas:
 
 void bar(lazy int i){} // like D1
 void bar(scope lazy int i){} // like D1
 void bar(closure int i){} // like current D2
 
 Bye,
 bearophile

I would assume a fix would be to add scope to input delegates and to require
some kind of declaration on the caller's side when the compiler can't prove
safety. It's best for ambiguous cases to be a warning (error). It also makes
the code easier for readers to follow.

Oct 25 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Sat, Oct 25, 2008 at 5:24 PM, Jason House
<jason.james.house gmail.com> wrote:
 bearophile Wrote:

 Jason House:
 The following spends 90% of its time in _d_alloc_memory
 void bar(lazy int i){}
 void foo(int i){ bar(i); }
 void main(){ foreach(int i; 1..1000000) foo(i); }
 Compiling with -O -release reduces it to 88% :)

 I see. So I presume it becomes quite difficult for D2 to compute up to the
25th term of this sequence (the D code is in the middle of the page) (it takes
just few seconds to run on D1):
 http://en.wikipedia.org/wiki/Man_or_boy_test

 What syntax can we use to avoid heap allocation? Few ideas:

 void bar(lazy int i){} // like D1
 void bar(scope lazy int i){} // like D1
 void bar(closure int i){} // like current D2


This makes no sense because the writer of bar has no idea whether the
caller will need a heap allocation or not.

 I would assume a fix would be to add scope to input delegates and to require
some kind of declaration on the caller's side when the compiler can't prove
safety. It's best for ambiguous cases to be a warning (error). It also makes
the code easier for readers to follow.

I think for a language like D,  hidden, hard to find memory
allocations like the one Andrei didn't know he was doing should be
eliminated.  By that I mean stack allocation (D1 behavior) should be
the default.  Then for places where you really want a closure, some
other syntax should be chosen.  The other reason I say that is that so
far in D I've only very seldom really wanted an allocated closure.  So
I think I will have to use the funky no-closure-please syntax way more
than I would have to use a make-me-a-closure-please syntax.

But apparently nobody who knows anything about what's actually going
to happen is involved in this discussion, so I think I'll just pipe
down for now.

--bb

Oct 25 2008

Jason House <jason.james.house gmail.com> writes:

Bill Baxter Wrote:

 On Sat, Oct 25, 2008 at 5:24 PM, Jason House
 <jason.james.house gmail.com> wrote:
 bearophile Wrote:

 Jason House:
 The following spends 90% of its time in _d_alloc_memory
 void bar(lazy int i){}
 void foo(int i){ bar(i); }
 void main(){ foreach(int i; 1..1000000) foo(i); }
 Compiling with -O -release reduces it to 88% :)

 I see. So I presume it becomes quite difficult for D2 to compute up to the
25th term of this sequence (the D code is in the middle of the page) (it takes
just few seconds to run on D1):
 http://en.wikipedia.org/wiki/Man_or_boy_test

 What syntax can we use to avoid heap allocation? Few ideas:

 void bar(lazy int i){} // like D1
 void bar(scope lazy int i){} // like D1
 void bar(closure int i){} // like current D2


 
 This makes no sense because the writer of bar has no idea whether the
 caller will need a heap allocation or not.
 
 I would assume a fix would be to add scope to input delegates and to require
some kind of declaration on the caller's side when the compiler can't prove
safety. It's best for ambiguous cases to be a warning (error). It also makes
the code easier for readers to follow.

 
 I think for a language like D,  hidden, hard to find memory
 allocations like the one Andrei didn't know he was doing should be
 eliminated.  By that I mean stack allocation (D1 behavior) should be
 the default.  Then for places where you really want a closure, some
 other syntax should be chosen.  The other reason I say that is that so
 far in D I've only very seldom really wanted an allocated closure.  So
 I think I will have to use the funky no-closure-please syntax way more
 than I would have to use a make-me-a-closure-please syntax.
 
 But apparently nobody who knows anything about what's actually going
 to happen is involved in this discussion, so I think I'll just pipe
 down for now.
 
 --bb

While I agree that should be the default, I've already seen plenty of D1 code
that incorrectly used stack-based closures. It really depends on your usage
patterns. I do a lot of inter-thread communication in D1

Oct 25 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Bill Baxter wrote:

 On Sat, Oct 25, 2008 at 5:24 PM, Jason House
 <jason.james.house gmail.com> wrote:
 bearophile Wrote:

 Jason House:
 The following spends 90% of its time in _d_alloc_memory
 void bar(lazy int i){}
 void foo(int i){ bar(i); }
 void main(){ foreach(int i; 1..1000000) foo(i); }
 Compiling with -O -release reduces it to 88% :)

 I see. So I presume it becomes quite difficult for D2 to compute up to
 the 25th term of this sequence (the D code is in the middle of the page)
 (it takes just few seconds to run on D1):
 http://en.wikipedia.org/wiki/Man_or_boy_test

 What syntax can we use to avoid heap allocation? Few ideas:

 void bar(lazy int i){} // like D1
 void bar(scope lazy int i){} // like D1
 void bar(closure int i){} // like current D2


 
 This makes no sense because the writer of bar has no idea whether the
 caller will need a heap allocation or not.
 
 I would assume a fix would be to add scope to input delegates and to
 require some kind of declaration on the caller's side when the compiler
 can't prove safety. It's best for ambiguous cases to be a warning
 (error). It also makes the code easier for readers to follow.

 
 I think for a language like D,  hidden, hard to find memory
 allocations like the one Andrei didn't know he was doing should be
 eliminated.  By that I mean stack allocation (D1 behavior) should be
 the default.  Then for places where you really want a closure, some
 other syntax should be chosen.  The other reason I say that is that so
 far in D I've only very seldom really wanted an allocated closure.  So
 I think I will have to use the funky no-closure-please syntax way more
 than I would have to use a make-me-a-closure-please syntax.

I agree that D1 behaviour should be the default, since otherwise it'll be
yet another breaking change. However, I do understand that the D1 behaviour
is the unsafe one, and as such the heap allocated version has merit as the
default.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Oct 25 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Sat, 25 Oct 2008 18:36:27 +0400, Lars Ivar Igesund  
<larsivar igesund.net> wrote:

 Bill Baxter wrote:

 On Sat, Oct 25, 2008 at 5:24 PM, Jason House
 <jason.james.house gmail.com> wrote:
 bearophile Wrote:

 Jason House:
 The following spends 90% of its time in _d_alloc_memory
 void bar(lazy int i){}
 void foo(int i){ bar(i); }
 void main(){ foreach(int i; 1..1000000) foo(i); }
 Compiling with -O -release reduces it to 88% :)

 I see. So I presume it becomes quite difficult for D2 to compute up to
 the 25th term of this sequence (the D code is in the middle of the  
 page)
 (it takes just few seconds to run on D1):
 http://en.wikipedia.org/wiki/Man_or_boy_test

 What syntax can we use to avoid heap allocation? Few ideas:

 void bar(lazy int i){} // like D1
 void bar(scope lazy int i){} // like D1
 void bar(closure int i){} // like current D2


 This makes no sense because the writer of bar has no idea whether the
 caller will need a heap allocation or not.

 I would assume a fix would be to add scope to input delegates and to
 require some kind of declaration on the caller's side when the compiler
 can't prove safety. It's best for ambiguous cases to be a warning
 (error). It also makes the code easier for readers to follow.

 I think for a language like D,  hidden, hard to find memory
 allocations like the one Andrei didn't know he was doing should be
 eliminated.  By that I mean stack allocation (D1 behavior) should be
 the default.  Then for places where you really want a closure, some
 other syntax should be chosen.  The other reason I say that is that so
 far in D I've only very seldom really wanted an allocated closure.  So
 I think I will have to use the funky no-closure-please syntax way more
 than I would have to use a make-me-a-closure-please syntax.

 I agree that D1 behaviour should be the default, since otherwise it'll be
 yet another breaking change. However, I do understand that the D1  
 behaviour
 is the unsafe one, and as such the heap allocated version has merit as  
 the
 default.

I believe the default should be the one that is most frequently used, even  
if it is less safe. Otherwise you may end up with a lot of code  
duplication.

I also think that scope and heap-allocated delegates should have different  
types so that no imlicit casting from scope delegate to heap one would be  
possible. In this case callee function that recieves the delegate might  
demand the delegate to be heap-allocated (because it stores it, for  
example).

Oct 25 2008

Lars Ivar Igesund <larsivar igesund.net> writes:

Denis Koroskin wrote:

 On Sat, 25 Oct 2008 18:36:27 +0400, Lars Ivar Igesund
 <larsivar igesund.net> wrote:
 
 Bill Baxter wrote:

 On Sat, Oct 25, 2008 at 5:24 PM, Jason House
 <jason.james.house gmail.com> wrote:
 bearophile Wrote:

 Jason House:
 The following spends 90% of its time in _d_alloc_memory
 void bar(lazy int i){}
 void foo(int i){ bar(i); }
 void main(){ foreach(int i; 1..1000000) foo(i); }
 Compiling with -O -release reduces it to 88% :)

 I see. So I presume it becomes quite difficult for D2 to compute up to
 the 25th term of this sequence (the D code is in the middle of the
 page)
 (it takes just few seconds to run on D1):
 http://en.wikipedia.org/wiki/Man_or_boy_test

 What syntax can we use to avoid heap allocation? Few ideas:

 void bar(lazy int i){} // like D1
 void bar(scope lazy int i){} // like D1
 void bar(closure int i){} // like current D2


 This makes no sense because the writer of bar has no idea whether the
 caller will need a heap allocation or not.

 I would assume a fix would be to add scope to input delegates and to
 require some kind of declaration on the caller's side when the compiler
 can't prove safety. It's best for ambiguous cases to be a warning
 (error). It also makes the code easier for readers to follow.

 I think for a language like D,  hidden, hard to find memory
 allocations like the one Andrei didn't know he was doing should be
 eliminated.  By that I mean stack allocation (D1 behavior) should be
 the default.  Then for places where you really want a closure, some
 other syntax should be chosen.  The other reason I say that is that so
 far in D I've only very seldom really wanted an allocated closure.  So
 I think I will have to use the funky no-closure-please syntax way more
 than I would have to use a make-me-a-closure-please syntax.

 I agree that D1 behaviour should be the default, since otherwise it'll be
 yet another breaking change. However, I do understand that the D1
 behaviour
 is the unsafe one, and as such the heap allocated version has merit as
 the
 default.

 
 I believe the default should be the one that is most frequently used, even
 if it is less safe. Otherwise you may end up with a lot of code
 duplication.
 
 I also think that scope and heap-allocated delegates should have different
 types so that no imlicit casting from scope delegate to heap one would be
 possible. In this case callee function that recieves the delegate might
 demand the delegate to be heap-allocated (because it stores it, for
 example).

I definately agree with this.

-- 
Lars Ivar Igesund
blog at http://larsivi.net
DSource, #d.tango & #D: larsivi
Dancing the Tango

Oct 25 2008

"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:

On Sat, Oct 25, 2008 at 10:44 AM, Denis Koroskin <2korden gmail.com> wrote:
 I also think that scope and heap-allocated delegates should have different
 types so that no imlicit casting from scope delegate to heap one would be
 possible. In this case callee function that recieves the delegate might
 demand the delegate to be heap-allocated (because it stores it, for
 example).

Fantastic.  That also neatly solves the "returning a delegate"
problem; it simply becomes illegal to return a scope delegate.

Oct 25 2008

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

 On Sat, Oct 25, 2008 at 10:44 AM, Denis Koroskin <2korden gmail.com> wrote:
 I also think that scope and heap-allocated delegates should have different
 types so that no imlicit casting from scope delegate to heap one would be
 possible. In this case callee function that recieves the delegate might
 demand the delegate to be heap-allocated (because it stores it, for
 example).


How would this work? For example:
-----
struct Struct {
     // fields...

     void foo() {
         // body
     }
}

void bar(Struct* p) {
     auto dg = &p.foo;	// stack-based or heap-based delegate?
     // do stuff with dg
}
-----
?
(There's no way to know if *p is a heap-based or stack-based struct)

Jarrett Billingsley wrote:
 Fantastic.  That also neatly solves the "returning a delegate"
 problem; it simply becomes illegal to return a scope delegate.

Even if "scope delegate" becomes a different type, sometimes such a 
"scope delegate"s is perfectly safe to return:
-----
alias scope void delegate() dg;
Dg foo(Dg dg) {
     return dg;	// Why would this be illegal?
}
-----

Oct 25 2008

"Denis Koroskin" <2korden gmail.com> writes:

On Sat, 25 Oct 2008 21:17:34 +0400, Frits van Bommel  
<fvbommel remwovexcapss.nl> wrote:

 On Sat, Oct 25, 2008 at 10:44 AM, Denis Koroskin <2korden gmail.com>  
 wrote:
 I also think that scope and heap-allocated delegates should have  
 different
 types so that no imlicit casting from scope delegate to heap one would  
 be
 possible. In this case callee function that recieves the delegate might
 demand the delegate to be heap-allocated (because it stores it, for
 example).


 How would this work? For example:
 -----
 struct Struct {
      // fields...

      void foo() {
          // body
      }
 }

 void bar(Struct* p) {
      auto dg = &p.foo;	// stack-based or heap-based delegate?
      // do stuff with dg
 }
 -----
 ?

Good question! First, let's expand the code:

void bar(Struct* p) {

     void delegate() dg;

     dg.ptr = p;
     dg.funcptr = &Struct.foo;

     // do stuff with dg
}

So, here is the question: is this a "stack-based or heap-based delegate?"  
I.e. may we return it from function and pass it to those functions that  
need heap-base delegate or not?

Yes, we may return it, obviously, and call outside of the function, so  
 from this point of view it is indeed "heap-allocated delegate" even if  
nothing is actually allocated. But someone might say that it is unsafe to  
call this dg because at some point object may become inexistant. To  
respond this, let's rewrite the code to make it trully heap-allocated and  
compare if it got any safer:

void bar(Struct* p)
{
     void foo()
     {
         p.foo();
     }

     auto dg = &foo;
}

Now dg is heap-allocated (in the sense that place for its local variable  
are allocated on heap). May we return this delegate from function? Yes. Is  
it any safer? No. They are absolutely the same.

      auto dg = &p.foo;	// stack-based or heap-based delegate?

Heap-based one, even if no actual allocation took place.

Oct 25 2008

"Jarrett Billingsley" <jarrett.billingsley gmail.com> writes:

On Sat, Oct 25, 2008 at 1:17 PM, Frits van Bommel
<fvbommel remwovexcapss.nl> wrote:
 Fantastic.  That also neatly solves the "returning a delegate"
 problem; it simply becomes illegal to return a scope delegate.

 Even if "scope delegate" becomes a different type, sometimes such a "scope
 delegate"s is perfectly safe to return:
 -----
 alias scope void delegate() dg;
 Dg foo(Dg dg) {
    return dg;  // Why would this be illegal?
 }
 -----

Clarification - it would be an error to return a scope delegate from
the scope in which it was declared.

Currently the behavior you mention (passing a scope delegate into a
function then returning it) doesn't even exist for scope classes -
parameters cannot be "scope".  I would imagine, though, that if a
parameter were scope, a function would be able to return that
parameter, and in fact that would be the only way to return a scope
reference (delegate or class) from a function.

Oct 25 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Denis Koroskin" wrote
 On Sat, 25 Oct 2008 18:36:27 +0400, Lars Ivar Igesund 
 <larsivar igesund.net> wrote:

 Bill Baxter wrote:

 On Sat, Oct 25, 2008 at 5:24 PM, Jason House
 <jason.james.house gmail.com> wrote:
 bearophile Wrote:

 Jason House:
 The following spends 90% of its time in _d_alloc_memory
 void bar(lazy int i){}
 void foo(int i){ bar(i); }
 void main(){ foreach(int i; 1..1000000) foo(i); }
 Compiling with -O -release reduces it to 88% :)

 I see. So I presume it becomes quite difficult for D2 to compute up to
 the 25th term of this sequence (the D code is in the middle of the 
 page)
 (it takes just few seconds to run on D1):
 http://en.wikipedia.org/wiki/Man_or_boy_test

 What syntax can we use to avoid heap allocation? Few ideas:

 void bar(lazy int i){} // like D1
 void bar(scope lazy int i){} // like D1
 void bar(closure int i){} // like current D2


 This makes no sense because the writer of bar has no idea whether the
 caller will need a heap allocation or not.

 I would assume a fix would be to add scope to input delegates and to
 require some kind of declaration on the caller's side when the compiler
 can't prove safety. It's best for ambiguous cases to be a warning
 (error). It also makes the code easier for readers to follow.

 I think for a language like D,  hidden, hard to find memory
 allocations like the one Andrei didn't know he was doing should be
 eliminated.  By that I mean stack allocation (D1 behavior) should be
 the default.  Then for places where you really want a closure, some
 other syntax should be chosen.  The other reason I say that is that so
 far in D I've only very seldom really wanted an allocated closure.  So
 I think I will have to use the funky no-closure-please syntax way more
 than I would have to use a make-me-a-closure-please syntax.

 I agree that D1 behaviour should be the default, since otherwise it'll be
 yet another breaking change. However, I do understand that the D1 
 behaviour
 is the unsafe one, and as such the heap allocated version has merit as 
 the
 default.

 I believe the default should be the one that is most frequently used, even 
 if it is less safe. Otherwise you may end up with a lot of code 
 duplication.

 I also think that scope and heap-allocated delegates should have different 
 types so that no imlicit casting from scope delegate to heap one would be 
 possible. In this case callee function that recieves the delegate might 
 demand the delegate to be heap-allocated (because it stores it, for 
 example).

I've been thinking about this solution, and I think the decision to allocate 
scope or heap should be left up to the developer, and no types should be 
assigned.

Think about an example like this:

class DelegateCaller
{
   private delegate int _foo();
   this(int delegate() foo) { _foo = foo; }
   int callit() { return _foo();}
}

int f1()
{
    int x() { return 5; }
    scope dc = new DelegateCaller(&x); // allocate on stack
    return dc.callit() * dc.callit();
}

DelegateCaller f2()
{
   int x() { return 5;}
   return new DelegateCaller(&x); // allocate on heap
}

So what type should DelegateCaller._foo be?

I think the only real solution to this, aside from compiler analysis (which 
introduces all kinds of problems), is to declare all delegates are stack or 
heap allocated by default, and allow the developer to deviate by declaring 
the delegate as opposite.

As I think most function delegates are expected to be stack allocated, it 
makes sense to me that stack delegates should be the default.

As a suggestion for syntax, I'd say heap-allocated delegates should use the 
new keyword somehow:

return new DelegateCaller(new(&x));

One issue to determine is how heap-allocated delegates are done.  Should 
there be only one heap allocation per function call, or one per 
instantiation?  If so, what happens if you change data in the function after 
instantiation? The difference is significant if you create multiple 
delegates:
int delegate() foo[];

int i = 0;
int getI() { return i; }

foo ~= new(&getI);

i++;
foo ~= new(&getI);
i++;
for(int j = 0; j < foo.length; j++)
{
   writefln(foo[j]);
}

What should be the correct output?

0
1

or

0
2

or

2
2

-Steve

Oct 26 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Sun, Oct 26, 2008 at 11:38 PM, Steven Schveighoffer
<schveiguy yahoo.com> wrote:
 I also think that scope and heap-allocated delegates should have different
 types so that no imlicit casting from scope delegate to heap one would be
 possible. In this case callee function that recieves the delegate might
 demand the delegate to be heap-allocated (because it stores it, for
 example).

 I've been thinking about this solution, and I think the decision to allocate
 scope or heap should be left up to the developer, and no types should be
 assigned.

 Think about an example like this:

 class DelegateCaller
 {
   private delegate int _foo();
   this(int delegate() foo) { _foo = foo; }
   int callit() { return _foo();}
 }

 int f1()
 {
    int x() { return 5; }
    scope dc = new DelegateCaller(&x); // allocate on stack
    return dc.callit() * dc.callit();
 }

 DelegateCaller f2()
 {
   int x() { return 5;}
   return new DelegateCaller(&x); // allocate on heap
 }

 So what type should DelegateCaller._foo be?

Ok, so that's a good example where only the caller knows that heap
allocation is necessary, and we already discussed a case where only
the callee knows it's necessary.

 I think the only real solution to this, aside from compiler analysis (which
 introduces all kinds of problems), is to declare all delegates are stack or
 heap allocated by default, and allow the developer to deviate by declaring
 the delegate as opposite.

It seems to me that from the two cases above, a good solution might be
to make stack the default but to allow *either* the callee *or* the
caller to request that that default be overridden.

 As I think most function delegates are expected to be stack allocated, it
 makes sense to me that stack delegates should be the default.

 As a suggestion for syntax, I'd say heap-allocated delegates should use the
 new keyword somehow:

 return new DelegateCaller(new(&x));

 One issue to determine is how heap-allocated delegates are done.  Should
 there be only one heap allocation per function call, or one per
 instantiation?  If so, what happens if you change data in the function after
 instantiation? The difference is significant if you create multiple
 delegates:
 int delegate() foo[];

 int i = 0;
 int getI() { return i; }

 foo ~= new(&getI);

 i++;
 foo ~= new(&getI);
 i++;
 for(int j = 0; j < foo.length; j++)
 {
   writefln(foo[j]);
 }

 What should be the correct output?

 0
 1

Without thinking about implementation or the current behavior at all,
this is the output I would expect from a full closure.
It should capture the state at the time of its creation.

With the either/or proposal you'll need another rule, I think.  If you
have a case like this:

    void longTermDelegateKeeper(new int delegate() dg) { ... }  //
here "new" means heap required
    ...
    int i = 0;
    int getI() { return i; }
    int delegate() foo[];
    foo ~= &getI;
    i++;
    longTermDelegateKeeper(foo[0]);  // <- what happens?

Here there are two options for the "what happens" line I think:
1) stack delegate returned by foo[0] triggers an implicit allocation
and copying of current stack variables. (so foo[0]() will return "1")
2) compiler error: "Heap delegate expected".


create a heap delegate out of the stack delegate.  And by doing that
force the caller to examine which state he really mean to capture in
that delegate.   Did he want it to capture the i==1 state or did he
want it to capture i==0?  And in a loop context it will force the
developer to notice that he's triggering implicit allocations inside a
loop when he may not mean to.

It also would make it possible to recognize allocations just by
looking at code locally.   Aside from these D2 delegates I think it's
always possible to tell looking at D code where the allocations are.
Setting a .length or doing ~= are not obviously (and not necessarily)
allocations, but if you see one then you can guess that allocation is
involved.   I don't really want to end up with a situation where I
have to guess if the code I'm looking at is doing allocation just by
calling a function that itself doesn't do any allocation either.

Finally -- do stack and heap delegates really need to be distinct
types?  Maybe not.  Maybe a run-time check would be good enough.  if
there's some kind of isHeapDelegate(dg) check available then, library
writers could use that.  The compiler wouldn't catch the error, but it
might be sufficient to catch at runtime in order to avoid the pain of
introducing more types.

--bb

Oct 26 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 I ported some monte carlo simulation code from Java to D2, and
 performance is horrible.
 
 34% of the execution time is used by std.random.uniform. To my great
 surprise, 25% of the execution  time is memory allocation (and
 collection) from that random call. The only candidate source I see is
 a call to ensure with lazy arguments. The memory allocation occurs at
 the start of the UniformDistribution call. I assume this is dynamic
 closure kicking in.
 
 Can anyone verify that this is the case?
 
 600000 memory allocations per second really kills performance!

std.random does not use dynamic memory allocation. Walter is almost done 
implementing static closures.

Andrei

Oct 24 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Sat, Oct 25, 2008 at 9:59 AM, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 I ported some monte carlo simulation code from Java to D2, and
 performance is horrible.

 34% of the execution time is used by std.random.uniform. To my great
 surprise, 25% of the execution  time is memory allocation (and
 collection) from that random call. The only candidate source I see is
 a call to ensure with lazy arguments. The memory allocation occurs at
 the start of the UniformDistribution call. I assume this is dynamic
 closure kicking in.

 Can anyone verify that this is the case?

 600000 memory allocations per second really kills performance!

 std.random does not use dynamic memory allocation.

Well the suggestion is that it may be using dynamic memory allocation
without intending to because of the dynamic closures.  Are you saying
that is definitely not the case?

 Walter is almost done implementing static closures.

Excellent!  So what strategy is being used?  I hope it's static by
default, dynamic on request, but your wording suggests otherwise.

--bb

Oct 24 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Bill Baxter wrote:
 On Sat, Oct 25, 2008 at 9:59 AM, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Jason House wrote:
 I ported some monte carlo simulation code from Java to D2, and
 performance is horrible.

 34% of the execution time is used by std.random.uniform. To my great
 surprise, 25% of the execution  time is memory allocation (and
 collection) from that random call. The only candidate source I see is
 a call to ensure with lazy arguments. The memory allocation occurs at
 the start of the UniformDistribution call. I assume this is dynamic
 closure kicking in.

 Can anyone verify that this is the case?

 600000 memory allocations per second really kills performance!

 std.random does not use dynamic memory allocation.

 
 Well the suggestion is that it may be using dynamic memory allocation
 without intending to because of the dynamic closures.  Are you saying
 that is definitely not the case?

I don't think there's any delegate in use in std.random.

 Walter is almost done implementing static closures.

 
 Excellent!  So what strategy is being used?  I hope it's static by
 default, dynamic on request, but your wording suggests otherwise.

I forgot.


Andrei

Oct 24 2008

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu wrote:

 I don't think there's any delegate in use in std.random.

Lazy arguments are delegates, and enforce uses lazy arguments

Oct 24 2008

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Jason House wrote:
 Andrei Alexandrescu wrote:
 
 I don't think there's any delegate in use in std.random.

 
 Lazy arguments are delegates, and enforce uses lazy arguments

Yikes, I see.

Andrei

Oct 24 2008

Jason House <jason.james.house gmail.com> writes:

Andrei Alexandrescu wrote:

 Jason House wrote:
 I ported some monte carlo simulation code from Java to D2, and
 performance is horrible.
 
 34% of the execution time is used by std.random.uniform. To my great
 surprise, 25% of the execution  time is memory allocation (and
 collection) from that random call. The only candidate source I see is
 a call to ensure with lazy arguments. The memory allocation occurs at
 the start of the UniformDistribution call. I assume this is dynamic
 closure kicking in.
 
 Can anyone verify that this is the case?
 
 600000 memory allocations per second really kills performance!

 
 std.random does not use dynamic memory allocation. 

This is exactly why so many have complained about the dynamic closure
implementation.  You did not intend to use dynamic memory allocation, but
it definitely does.  A program with nothing but a loop that calls uniform
will show it plain as day in the profiler. (I'm using callgrind)

 Walter is almost done 
 implementing static closures.

Ooh...  Can you elaborate on that?

Oct 24 2008

Jason House <jason.james.house gmail.com> writes:

Jason House Wrote:

 Gregor Richards Wrote:
 
 Jason House wrote:
 I ported some monte carlo simulation code from Java to D2, and performance is
horrible.
 
 34% of the execution time is used by std.random.uniform. To my great surprise,
25% of the execution  time is memory allocation (and collection) from that
random call. The only candidate source I see is a call to ensure with lazy
arguments. The memory allocation occurs at the start of the UniformDistribution
call. I assume this is dynamic closure kicking in.
 
 Can anyone verify that this is the case?
 
 600000 memory allocations per second really kills performance!

 
 Java has a much better garbage collector than D, as it doesn't need to 
 be conservative.
 
   - Gregor Richards

 
 The code is written to explicitly avoid memory allocation, especially in tight
loops. Without this dynamic closure, the garbage collecor would never run. This
case is especially pathetic since the call to ensure will never trigger. 
 
 This is part of a mini language shootout. The Java version I cloned runs 4x
faster. This is only one piece of a much bigger problem.

I was wrong about the 4x thing. I have bad hardware. After fixing the
accidental allocation and running both the D and Java version on the same box,
they're only 1% different.

Oct 28 2008

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

"Jason House" wrote
 Jason House Wrote:

 Gregor Richards Wrote:

 Jason House wrote:
 I ported some monte carlo simulation code from Java to D2, and 
 performance is horrible.

 34% of the execution time is used by std.random.uniform. To my great 
 surprise, 25% of the execution  time is memory allocation (and 
 collection) from that random call. The only candidate source I see is 
 a call to ensure with lazy arguments. The memory allocation occurs at 
 the start of the UniformDistribution call. I assume this is dynamic 
 closure kicking in.

 Can anyone verify that this is the case?

 600000 memory allocations per second really kills performance!

 Java has a much better garbage collector than D, as it doesn't need to
 be conservative.

   - Gregor Richards

 The code is written to explicitly avoid memory allocation, especially in 
 tight loops. Without this dynamic closure, the garbage collecor would 
 never run. This case is especially pathetic since the call to ensure will 
 never trigger.

 This is part of a mini language shootout. The Java version I cloned runs 
 4x faster. This is only one piece of a much bigger problem.

 I was wrong about the 4x thing. I have bad hardware. After fixing the 
 accidental allocation and running both the D and Java version on the same 
 box, they're only 1% different.

When you say 'fixing the accidental allocation' you mean removing the case 
where a dynamic closure was allocated?

I just want to make sure that is clear.

-Steve

Oct 28 2008

Jason House <jason.james.house gmail.com> writes:

Steven Schveighoffer Wrote:

 "Jason House" wrote
 Jason House Wrote:

 Gregor Richards Wrote:

 Jason House wrote:
 I ported some monte carlo simulation code from Java to D2, and 
 performance is horrible.

 34% of the execution time is used by std.random.uniform. To my great 
 surprise, 25% of the execution  time is memory allocation (and 
 collection) from that random call. The only candidate source I see is 
 a call to ensure with lazy arguments. The memory allocation occurs at 
 the start of the UniformDistribution call. I assume this is dynamic 
 closure kicking in.

 Can anyone verify that this is the case?

 600000 memory allocations per second really kills performance!

 Java has a much better garbage collector than D, as it doesn't need to
 be conservative.

   - Gregor Richards

 The code is written to explicitly avoid memory allocation, especially in 
 tight loops. Without this dynamic closure, the garbage collecor would 
 never run. This case is especially pathetic since the call to ensure will 
 never trigger.

 This is part of a mini language shootout. The Java version I cloned runs 
 4x faster. This is only one piece of a much bigger problem.

 I was wrong about the 4x thing. I have bad hardware. After fixing the 
 accidental allocation and running both the D and Java version on the same 
 box, they're only 1% different.

 
 When you say 'fixing the accidental allocation' you mean removing the case 
 where a dynamic closure was allocated?
 
 I just want to make sure that is clear.
 
 -Steve

Yes. You are right. The allocation of the dynamic closure was the only
performance problem, and consumed 25% of my execution time.  I called it
accidental because Andrei was unaware that he had done it.

Oct 29 2008

Russell Lewis <webmaster villagersonline.com> writes:

Objective 1: Make the heap vs. stack variables explicit
Objective 2: Make it impossible to return or store a static (stack) delegate
Objective 3: Don't require decorators on lambda expressions.

Solution:
- Variables are on stack by default
- Use modifier "heap" to put a variable on the heap
- Delegates can be normal (storable) or "scope" (can't live beyond the 
scope of our function, and the type is inferred BASED ON WHAT VARIABLES 
YOU ACCESS.


EXAMPLE CODE

void foo(scope void delegate()) {...}
void bar(void delegate()) {...}

void main()
{
   int a;
   heap int b;

   foo({ a = 1; });	// legal.
   bar({ b = 2; });	// legal.  bar could store dg, but b is on heap
   foo({ b = 3; });	// legal.  ok to pass non-scope dg to
			//         scope argument
   bar({ a = 4; });	// SYNTAX ERROR
			// delegate is scope b/c a is on stack, but
			// argument to bar isn't scope.
}

END CODE


Thoughts?

Nov 03 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Dynamic Closure + Lazy Arguments = Performance Killer?