digitalmars.D.learn - DMD2 out parameters

Pete (39/39) Dec 23 2010 Hi,

Pete (2/2) Dec 23 2010 //If you initialise f to 0 before calling func then it all works quickly...

Pete (5/5) Dec 23 2010 Ok, i've done some more investigating and it appears that in DMD2 a floa...

Don (6/12) Dec 23 2010 Yes, it sounds like a NaN-related peformance issue. Note, though, that

Pete (1/1) Dec 23 2010 I noticed this on an Intel Core 2. I skipped the pentium 4 generation :)

Johann MacDonagh (64/69) Dec 23 2010 I just did a test with DMD 2.051 on Linux

bearophile (4/6) Dec 24 2010 Please add the whole explanation in a bug report. (If you don't want to ...

Pete <nospam mailinator.com> writes:

Hi,

I'm not sure if this is already a widely known phenomenon but I ran across a
little gotcha yesterday regarding floating point out parameters using DMD2.

A year or so ago I wrote a ray tracer using DMD1. A few months ago I tried
compiling and running it using DMD2. It was 50% slower. This disappointed me
so much that I stopped using D2 until about a week ago. I spent a few hours
yesterday investigating why the D2 version of the code was so much worse than
the D1 version. After some head scratching and use of -profile and objconv, I
eventually managed to isolate the problem. It boiled down to this example:

float f;
func(f);

void func(out float ff) {
ff = 1;
}

This use of 'out' causes func to execute in around 250 ticks on DMD2. Change
'out' to 'ref' and it takes around 10 ticks (the same time as the 'out'
version executes on DMD1). If you initialise f to 0 before calling func then
it all works quickly again which makes me wonder whether it's some strange
DMD2 nan/fpu exceptions quirk which may be documented somewhere?? When I
looked at the generated assembly I saw that both DMD1 and DMD2 seem to
generate the same thing (using -O -inline - release):

func  LABEL NEAR
push    ebp
mov     ebp, esp
push    eax       // eax = ptr to ff
fld     dword ptr [_nan]
fstp    dword ptr [eax]
fld     dword ptr [_one]
fstp    dword ptr [eax]

mov     esp, ebp
pop     ebp
ret

Now this code looks ok if you ignore the fact that 'ff' is being written to
twice. And the strange seemingly redundant push of EAX.

Has anyone else come across this and if so is it a bug? I'm also interested in
people's thoughts on the strange code gen.

My D2 version is now running faster than the old D1 version by the way :)

Regards,
Pete.

Dec 23 2010

Pete <nospam mailinator.com> writes:

//If you initialise f to 0 before calling func then it all works quickly again

Actually I think this is a red herring. I don't think initialising f helps

Dec 23 2010

Pete <nospam mailinator.com> writes:

Ok, i've done some more investigating and it appears that in DMD2 a float NaN is
0x7FE00000 (in dword format) but when it initialises a float 'out' parameter it
initialises it with 0x7FA00000H. This causes an FPU trap which is where the time
is going. This looks like a bug to me. Can anyone confirm?

Thanks.

Dec 23 2010

Don <nospam nospam.com> writes:

Pete wrote:
 Ok, i've done some more investigating and it appears that in DMD2 a float NaN
is
 0x7FE00000 (in dword format) but when it initialises a float 'out' parameter it
 initialises it with 0x7FA00000H. This causes an FPU trap which is where the
time
 is going. This looks like a bug to me. Can anyone confirm?
 
 Thanks.

Yes, it sounds like a NaN-related peformance issue. Note, though, that 
the slowdown you experience is processor-model specific. It's a penalty 
of ~250 cycles on a Pentium 4 with x87 instructions, but zero cycles on 
many other processors. (in fact, it's also zero cycles with SSE on 
Pentium 4!).

Dec 23 2010

Pete <nospam mailinator.com> writes:

I noticed this on an Intel Core 2. I skipped the pentium 4 generation :)

Dec 23 2010

Johann MacDonagh <johann.macdonagh..no spam..gmail.com> writes:

On 12/23/2010 12:19 PM, Pete wrote:
 Ok, i've done some more investigating and it appears that in DMD2 a float NaN
is
 0x7FE00000 (in dword format) but when it initialises a float 'out' parameter it
 initialises it with 0x7FA00000H. This causes an FPU trap which is where the
time
 is going. This looks like a bug to me. Can anyone confirm?

 Thanks.

I just did a test with DMD 2.051 on Linux

void F1(ref float a)
{
         a++;
}

void F2(out float a)
{
         a++;
}

void main()
{
         float a;
         float b;

         F1(a);
         F2(b);
}

And ASM:

080490e4 <_D3out2F1FKfZv>:
  80490e4:       55                      push   ebp
  80490e5:       8b ec                   mov    ebp,esp
  80490e7:       83 ec 04                sub    esp,0x4
  80490ea:       d9 e8                   fld1
  80490ec:       d8 00                   fadd   DWORD PTR [eax]
  80490ee:       d9 18                   fstp   DWORD PTR [eax]
  80490f0:       c9                      leave
  80490f1:       c3                      ret
  80490f2:       90                      nop
  80490f3:       90                      nop

080490f4 <_D3out2F2FJfZv>:
  80490f4:       55                      push   ebp
  80490f5:       8b ec                   mov    ebp,esp
  80490f7:       83 ec 04                sub    esp,0x4
  80490fa:       d9 05 00 81 05 08       fld    DWORD PTR ds:0x8058100
  8049100:       d9 18                   fstp   DWORD PTR [eax]
  8049102:       d9 e8                   fld1
  8049104:       d8 00                   fadd   DWORD PTR [eax]
  8049106:       d9 18                   fstp   DWORD PTR [eax]
  8049108:       c9                      leave
  8049109:       c3                      ret
  804910a:       90                      nop
  804910b:       90                      nop

0804910c <_Dmain>:
  804910c:       55                      push   ebp
  804910d:       8b ec                   mov    ebp,esp
  804910f:       83 ec 08                sub    esp,0x8
  8049112:       d9 05 00 81 05 08       fld    DWORD PTR ds:0x8058100
  8049118:       d9 5d f8                fstp   DWORD PTR [ebp-0x8]
  804911b:       d9 05 00 81 05 08       fld    DWORD PTR ds:0x8058100
  8049121:       d9 5d fc                fstp   DWORD PTR [ebp-0x4]
  8049124:       8d 45 f8                lea    eax,[ebp-0x8]
  8049127:       e8 b8 ff ff ff          call   80490e4 <_D3out2F1FKfZv>
  804912c:       8d 45 fc                lea    eax,[ebp-0x4]
  804912f:       e8 c0 ff ff ff          call   80490f4 <_D3out2F2FJfZv>
  8049134:       31 c0                   xor    eax,eax
  8049136:       c9                      leave
  8049137:       c3                      ret

And 0x8058100 is 0x7FA00000. As you can see out doesn't force the 
loading and storing of a different NaN value.

Of course, maybe the compiler should skip initializing a float that gets 
passed into a routine as an out parameter as its first use. E.g.

float a;
a = 1.0;

wouldn't generate two separate assignments.

Dec 23 2010

bearophile <bearophileHUGS lycos.com> writes:

Pete:

 Has anyone else come across this and if so is it a bug? I'm also interested in
 people's thoughts on the strange code gen.

Please add the whole explanation in a bug report. (If you don't want to write
the bug report then I'll write it myself).

Bye,
bearophile

Dec 24 2010

D Programming

C/C++ Programming

Other

digitalmars.D.learn - DMD2 out parameters