www.digitalmars.com         C & C++   DMDScript  

D.gnu - GDC generates invalid assembly around fiber yield operations (Not

reply "Liran Zvibel" <liran weka.io> writes:
Hi,

We are trying to port a large fiber based application to GDC.
Our application works well when compiled with DMD with 
optimizations.
It fails very quickly with GDC (even before we tried 
optimizations), and we were able to narrow it to how GDC treats 
yields.

Please look at the following small program (minor changes from 
the core.thread.Fiber example):

import std.stdio;
import core.thread;

enum numAddition = 3;
long globalSum = 0;

class DerivedFiber : Fiber
{
     int index;
     this(int _index)
     {
         index = _index;
         super( &run );
     }

private :
     void run()
     {
         foreach(int k; 0 .. numAddition) {
             globalSum += otherFunc();
             writefln("DerivedFiber(%d) iteration %d globalSum is 
%d", index, k, globalSum);
         }
     }

     long otherFunc() {
         yield();
         return index;
     }
}

int main()
{
     Fiber derived1 = new DerivedFiber(1);
     Fiber derived2 = new DerivedFiber(2);

     foreach(j; 0 .. (numAddition+1)) {
         derived1.call();
         derived2.call();
     }

     assert(globalSum == (1+2)*numAddition);
     return 0;
}


And the following output when compiling first with DMD and then 
with GDC:



DerivedFiber(1) iteration 0 globalSum is 1
DerivedFiber(2) iteration 0 globalSum is 3
DerivedFiber(1) iteration 1 globalSum is 4
DerivedFiber(2) iteration 1 globalSum is 6
DerivedFiber(1) iteration 2 globalSum is 7
DerivedFiber(2) iteration 2 globalSum is 9


DerivedFiber(1) iteration 0 globalSum is 1
DerivedFiber(2) iteration 0 globalSum is 2
DerivedFiber(1) iteration 1 globalSum is 2
DerivedFiber(2) iteration 1 globalSum is 4
DerivedFiber(1) iteration 2 globalSum is 3
DerivedFiber(2) iteration 2 globalSum is 6
core.exception.AssertError fiber.d(41): Assertion failure
[ Removed the very simple stack...]


When looking at the generated assembly, it's very easy to see 
that the value of 'globalSum' is read to register before the call 
to `otherFunc` (and also does not refresh it across the 
'foreach'). The problem is that otherFunc calls 'yield' which 
changes context and causes 'globalSum' to be updated.
GDC has to know that after `yield` is called memory is 
potentially clobbered, and re-read memory back to registers.

I tried some things that did not help:
- declaring 'globalSum' as 'shared' or '__gshared'.
- declaring 'globalSum' as 'static' inside DerivedFiber, also 
tried adding 'shared' or '__gshared'.
- adding 'asm  {"nop;" :  :  : "memory" ;}' after the 'yield'.
- defining 'otherFunc' with  attribute("returns_twice") -- got 
'error: unknown attribute returns_twice'.

Two things did work, but are unpractical when porting a big 
application (over 100k loc) that heavily relies on Fibers:
- using a temporary variable to hold the 'otherFunc' return, then 
add it to 'globalSum'
- calling core.atomic.otomicOp!"+="(globalSum, otherFunc())


We COULD identify all functions that immediately call 'yield' and 
mark them with  attribute("returns_twice"), but that does not 
seem to be supported by GDC.

Any ideas?

Thanks!

Liran
May 03 2015
next sibling parent "Liran Zvibel" <liran weka.io> writes:
Hi,

Another thing that I noticed about this issue, is that in this 
isolated test case, the problem only happens if the added value 
is returned from the function that performs the 'yield()'.
If I call 'yield()' directly, then add to 'globalSum', or if I 
just call 'otherFunc()', ignore the return value and then add to 
'globalSum' then the result is correct.

An example of run version that works:
     void run()
     {
         foreach(int k; 0 .. numAddition) {
             otherFunc();
             globalSum += index;
             writefln("DerivedFiber(%d) iteration %d globalSum is 
%d", index, k, globalSum);
         }
     }

An example of run version that breaks (exactly like the one from 
my previous mail):
     void run()
     {
         foreach(int k; 0 .. numAddition) {
             globalSum += otherFunc();
             writefln("DerivedFiber(%d) iteration %d globalSum is 
%d", index, k, globalSum);
         }
     }

Maybe this can help you understand something ...

Thanks,

Liran
May 05 2015
prev sibling parent reply "Liran Zvibel" <liran weka.io> writes:
Hi,

I forgot to mention is before, I'm running with GDC commit 
b022dd4cac195d85e9c3a6a37f2501a07ade455a from April 7th based on 
GCC 4.9.

Does anyone have experience compiling (and then running) Fiber 
based applications with GDC?
What are you doing?

Is there a plan to add support for  attribute("returns_twice") to 
GDC?

Thanks!

Liran

On Sunday, 3 May 2015 at 18:45:36 UTC, Liran Zvibel wrote:
 Hi,

 We are trying to port a large fiber based application to GDC.
 Our application works well when compiled with DMD with 
 optimizations.
 It fails very quickly with GDC (even before we tried 
 optimizations), and we were able to narrow it to how GDC treats 
 yields.

 Please look at the following small program (minor changes from 
 the core.thread.Fiber example):

 import std.stdio;
 import core.thread;

 enum numAddition = 3;
 long globalSum = 0;

 class DerivedFiber : Fiber
 {
     int index;
     this(int _index)
     {
         index = _index;
         super( &run );
     }

 private :
     void run()
     {
         foreach(int k; 0 .. numAddition) {
             globalSum += otherFunc();
             writefln("DerivedFiber(%d) iteration %d globalSum 
 is %d", index, k, globalSum);
         }
     }

     long otherFunc() {
         yield();
         return index;
     }
 }

 int main()
 {
     Fiber derived1 = new DerivedFiber(1);
     Fiber derived2 = new DerivedFiber(2);

     foreach(j; 0 .. (numAddition+1)) {
         derived1.call();
         derived2.call();
     }

     assert(globalSum == (1+2)*numAddition);
     return 0;
 }


 And the following output when compiling first with DMD and then 
 with GDC:



 DerivedFiber(1) iteration 0 globalSum is 1
 DerivedFiber(2) iteration 0 globalSum is 3
 DerivedFiber(1) iteration 1 globalSum is 4
 DerivedFiber(2) iteration 1 globalSum is 6
 DerivedFiber(1) iteration 2 globalSum is 7
 DerivedFiber(2) iteration 2 globalSum is 9


 DerivedFiber(1) iteration 0 globalSum is 1
 DerivedFiber(2) iteration 0 globalSum is 2
 DerivedFiber(1) iteration 1 globalSum is 2
 DerivedFiber(2) iteration 1 globalSum is 4
 DerivedFiber(1) iteration 2 globalSum is 3
 DerivedFiber(2) iteration 2 globalSum is 6
 core.exception.AssertError fiber.d(41): Assertion failure
 [ Removed the very simple stack...]


 When looking at the generated assembly, it's very easy to see 
 that the value of 'globalSum' is read to register before the 
 call to `otherFunc` (and also does not refresh it across the 
 'foreach'). The problem is that otherFunc calls 'yield' which 
 changes context and causes 'globalSum' to be updated.
 GDC has to know that after `yield` is called memory is 
 potentially clobbered, and re-read memory back to registers.

 I tried some things that did not help:
 - declaring 'globalSum' as 'shared' or '__gshared'.
 - declaring 'globalSum' as 'static' inside DerivedFiber, also 
 tried adding 'shared' or '__gshared'.
 - adding 'asm  {"nop;" :  :  : "memory" ;}' after the 'yield'.
 - defining 'otherFunc' with  attribute("returns_twice") -- got 
 'error: unknown attribute returns_twice'.

 Two things did work, but are unpractical when porting a big 
 application (over 100k loc) that heavily relies on Fibers:
 - using a temporary variable to hold the 'otherFunc' return, 
 then add it to 'globalSum'
 - calling core.atomic.otomicOp!"+="(globalSum, otherFunc())


 We COULD identify all functions that immediately call 'yield' 
 and mark them with  attribute("returns_twice"), but that does 
 not seem to be supported by GDC.

 Any ideas?

 Thanks!

 Liran
May 14 2015
parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 14 May 2015 15:52:05 +0000
schrieb "Liran Zvibel" <liran weka.io>:

 Hi,
 
 I forgot to mention is before, I'm running with GDC commit 
 b022dd4cac195d85e9c3a6a37f2501a07ade455a from April 7th based on 
 GCC 4.9.
 
 Does anyone have experience compiling (and then running) Fiber 
 based applications with GDC?
 What are you doing?
 
 Is there a plan to add support for  attribute("returns_twice") to 
 GDC?
 
 Thanks!
I can reproduce this using the latest master branch. This exact issue is kinda weird: GCC can't cache the globalSum across a call to otherFunc. otherFunc calls yield which is in a different module. yield could modify the globalSum value and the generated code would be invalid even without Fibers. Iain: I think our codegen might be wrong. -fdump-tree-original: 61 var_decl name: 76 mngl: 77 type: 60 scpe: 54 srcp: test.d:5 size: 39 algn: 64 used: 1 62 plus_expr type: 60 op 0: 61 op 1: 78 78 call_expr type: 60 fn : 96 0 : 23 Does GCC guarantee an evaluation order for plus_expr? This is kinda weird btw. The rewrite might already happen in the frontend. If we rewrite globalSum += otherFunc(); to globalSum = globalSum + otherFunc(); and follow strict LTR evaluation the code we generate is correct and the code DMD generates is incorrect. OTOH I don't know the exact rules for += but intuitively it should first evaluate the RHS, then load the LHS. Note: The issue the OP assumed is also worth discussing. It can't happen with global variables, but a second Fiber could modify a local variable. I'm not sure if it's possible to construct a case where GCC legitimately assumes a variable can't escape and we modify it through yield.
May 14 2015
next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Thu, 14 May 2015 19:02:48 +0200
schrieb Johannes Pfau <nospam example.com>:

 ...
TLDR As a workaround replace globalSum += otherFunc(); which GDC currently treats as globalSum = globalSum + otherFunc(); with globalSum = otherFunc() + globalSum;
May 15 2015
next sibling parent "Iain Buclaw via D.gnu" <d.gnu puremagic.com> writes:
On 15 May 2015 at 09:08, Johannes Pfau via D.gnu <d.gnu puremagic.com> wrote:
 Am Thu, 14 May 2015 19:02:48 +0200
 schrieb Johannes Pfau <nospam example.com>:

 ...
TLDR As a workaround replace globalSum += otherFunc(); which GDC currently treats as globalSum = globalSum + otherFunc(); with globalSum = otherFunc() + globalSum;
That is an interesting workaround, should we perhaps reconsider our code generation here? Iain.
May 15 2015
prev sibling next sibling parent reply "Iain Buclaw via D.gnu" <d.gnu puremagic.com> writes:
On 15 May 2015 at 09:13, Iain Buclaw <ibuclaw gdcproject.org> wrote:
 On 15 May 2015 at 09:08, Johannes Pfau via D.gnu <d.gnu puremagic.com> wrote:
 Am Thu, 14 May 2015 19:02:48 +0200
 schrieb Johannes Pfau <nospam example.com>:

 ...
TLDR As a workaround replace globalSum += otherFunc(); which GDC currently treats as globalSum = globalSum + otherFunc(); with globalSum = otherFunc() + globalSum;
That is an interesting workaround, should we perhaps reconsider our code generation here?
I can confirm that C codegen does infact emit 'foo += bar()' as 'foo = bar() + foo' Which only strengthens the reasoning to change it. Liran, can you raise a bug report? Also, can we use your small sample (names will be anonymised) to put into the testsuite? Regards Iain.
May 15 2015
parent Johannes Pfau <nospam example.com> writes:
Am Fri, 15 May 2015 09:23:33 +0200
schrieb "Iain Buclaw via D.gnu" <d.gnu puremagic.com>:

 On 15 May 2015 at 09:13, Iain Buclaw <ibuclaw gdcproject.org> wrote:
 On 15 May 2015 at 09:08, Johannes Pfau via D.gnu
 <d.gnu puremagic.com> wrote:
 Am Thu, 14 May 2015 19:02:48 +0200
 schrieb Johannes Pfau <nospam example.com>:

 ...
TLDR As a workaround replace globalSum += otherFunc(); which GDC currently treats as globalSum = globalSum + otherFunc(); with globalSum = otherFunc() + globalSum;
That is an interesting workaround, should we perhaps reconsider our code generation here?
I can confirm that C codegen does infact emit 'foo += bar()' as 'foo = bar() + foo' Which only strengthens the reasoning to change it. Liran, can you raise a bug report? Also, can we use your small sample (names will be anonymised) to put into the testsuite? Regards Iain.
BTW: For commutative operations we can simply change the operands. For non-commutative operations we'll have to explicitly evaluate the side effects of the RHS before assigning. (-=, ...) Relevant code is in d-elem.cc: AddAssignExp::toElem etc
May 15 2015
prev sibling next sibling parent "Liran Zvibel via D.gnu" <d.gnu puremagic.com> writes:
Please feel free to use my code sample in the test suite.
I fill post a bug report.

Thanks!


Liran
 On May 15, 2015, at 10:23, Iain Buclaw via D.gnu <d.gnu puremagic.com> wrote:
 
 On 15 May 2015 at 09:13, Iain Buclaw <ibuclaw gdcproject.org> wrote:
 On 15 May 2015 at 09:08, Johannes Pfau via D.gnu <d.gnu puremagic.com> wrote:
 Am Thu, 14 May 2015 19:02:48 +0200
 schrieb Johannes Pfau <nospam example.com>:
 
 ...
TLDR As a workaround replace globalSum += otherFunc(); which GDC currently treats as globalSum = globalSum + otherFunc(); with globalSum = otherFunc() + globalSum;
That is an interesting workaround, should we perhaps reconsider our code generation here?
I can confirm that C codegen does infact emit 'foo += bar()' as 'foo = bar() + foo' Which only strengthens the reasoning to change it. Liran, can you raise a bug report? Also, can we use your small sample (names will be anonymised) to put into the testsuite? Regards Iain.
May 15 2015
prev sibling parent "Liran Zvibel via D.gnu" <d.gnu puremagic.com> writes:
Hi,
I’m trying to port a 120k loc fiber based application from dmd to gdc, so
applying this work around is not a good option for me (also, there are many
engineers that will continue generating code…)

Thanks
Liran
 On May 15, 2015, at 10:08, Johannes Pfau via D.gnu <d.gnu puremagic.com> wrote:
 
 Am Thu, 14 May 2015 19:02:48 +0200
 schrieb Johannes Pfau <nospam example.com>:
 
 ...
TLDR As a workaround replace globalSum += otherFunc(); which GDC currently treats as globalSum = globalSum + otherFunc(); with globalSum = otherFunc() + globalSum;
May 15 2015
prev sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Thu, 14 May 2015 19:02:48 +0200, Johannes Pfau wrote:

 OTOH I don't know the exact rules for +=3D but intuitively it should firs=
t
 evaluate the RHS, then load the LHS.
this is not the case for `~=3D` (see [1]). yet i believe that there will be= =20 myriads of reasons from DMD core team to decide that `+=3D` is very special= =20 (or `~=3D` is very special), and consistency sux. [1] https://issues.dlang.org/show_bug.cgi?id=3D13670=
May 18 2015