www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - DMD2 out parameters

reply Pete <nospam mailinator.com> writes:
Hi,

I'm not sure if this is already a widely known phenomenon but I ran across a
little gotcha yesterday regarding floating point out parameters using DMD2.

A year or so ago I wrote a ray tracer using DMD1. A few months ago I tried
compiling and running it using DMD2. It was 50% slower. This disappointed me
so much that I stopped using D2 until about a week ago. I spent a few hours
yesterday investigating why the D2 version of the code was so much worse than
the D1 version. After some head scratching and use of -profile and objconv, I
eventually managed to isolate the problem. It boiled down to this example:

float f;
func(f);

void func(out float ff) {
ff = 1;
}

This use of 'out' causes func to execute in around 250 ticks on DMD2. Change
'out' to 'ref' and it takes around 10 ticks (the same time as the 'out'
version executes on DMD1). If you initialise f to 0 before calling func then
it all works quickly again which makes me wonder whether it's some strange
DMD2 nan/fpu exceptions quirk which may be documented somewhere?? When I
looked at the generated assembly I saw that both DMD1 and DMD2 seem to
generate the same thing (using -O -inline - release):

func  LABEL NEAR
push    ebp
mov     ebp, esp
push    eax       // eax = ptr to ff
fld     dword ptr [_nan]
fstp    dword ptr [eax]
fld     dword ptr [_one]
fstp    dword ptr [eax]

mov     esp, ebp
pop     ebp
ret

Now this code looks ok if you ignore the fact that 'ff' is being written to
twice. And the strange seemingly redundant push of EAX.

Has anyone else come across this and if so is it a bug? I'm also interested in
people's thoughts on the strange code gen.

My D2 version is now running faster than the old D1 version by the way :)

Regards,
Pete.
Dec 23 2010
next sibling parent reply Pete <nospam mailinator.com> writes:
//If you initialise f to 0 before calling func then it all works quickly again

Actually I think this is a red herring. I don't think initialising f helps
Dec 23 2010
parent reply Pete <nospam mailinator.com> writes:
Ok, i've done some more investigating and it appears that in DMD2 a float NaN is
0x7FE00000 (in dword format) but when it initialises a float 'out' parameter it
initialises it with 0x7FA00000H. This causes an FPU trap which is where the time
is going. This looks like a bug to me. Can anyone confirm?

Thanks.
Dec 23 2010
next sibling parent reply Don <nospam nospam.com> writes:
Pete wrote:
 Ok, i've done some more investigating and it appears that in DMD2 a float NaN
is
 0x7FE00000 (in dword format) but when it initialises a float 'out' parameter it
 initialises it with 0x7FA00000H. This causes an FPU trap which is where the
time
 is going. This looks like a bug to me. Can anyone confirm?
 
 Thanks.
Yes, it sounds like a NaN-related peformance issue. Note, though, that the slowdown you experience is processor-model specific. It's a penalty of ~250 cycles on a Pentium 4 with x87 instructions, but zero cycles on many other processors. (in fact, it's also zero cycles with SSE on Pentium 4!).
Dec 23 2010
parent Pete <nospam mailinator.com> writes:
I noticed this on an Intel Core 2. I skipped the pentium 4 generation :)
Dec 23 2010
prev sibling parent Johann MacDonagh <johann.macdonagh..no spam..gmail.com> writes:
On 12/23/2010 12:19 PM, Pete wrote:
 Ok, i've done some more investigating and it appears that in DMD2 a float NaN
is
 0x7FE00000 (in dword format) but when it initialises a float 'out' parameter it
 initialises it with 0x7FA00000H. This causes an FPU trap which is where the
time
 is going. This looks like a bug to me. Can anyone confirm?

 Thanks.
I just did a test with DMD 2.051 on Linux void F1(ref float a) { a++; } void F2(out float a) { a++; } void main() { float a; float b; F1(a); F2(b); } And ASM: 080490e4 <_D3out2F1FKfZv>: 80490e4: 55 push ebp 80490e5: 8b ec mov ebp,esp 80490e7: 83 ec 04 sub esp,0x4 80490ea: d9 e8 fld1 80490ec: d8 00 fadd DWORD PTR [eax] 80490ee: d9 18 fstp DWORD PTR [eax] 80490f0: c9 leave 80490f1: c3 ret 80490f2: 90 nop 80490f3: 90 nop 080490f4 <_D3out2F2FJfZv>: 80490f4: 55 push ebp 80490f5: 8b ec mov ebp,esp 80490f7: 83 ec 04 sub esp,0x4 80490fa: d9 05 00 81 05 08 fld DWORD PTR ds:0x8058100 8049100: d9 18 fstp DWORD PTR [eax] 8049102: d9 e8 fld1 8049104: d8 00 fadd DWORD PTR [eax] 8049106: d9 18 fstp DWORD PTR [eax] 8049108: c9 leave 8049109: c3 ret 804910a: 90 nop 804910b: 90 nop 0804910c <_Dmain>: 804910c: 55 push ebp 804910d: 8b ec mov ebp,esp 804910f: 83 ec 08 sub esp,0x8 8049112: d9 05 00 81 05 08 fld DWORD PTR ds:0x8058100 8049118: d9 5d f8 fstp DWORD PTR [ebp-0x8] 804911b: d9 05 00 81 05 08 fld DWORD PTR ds:0x8058100 8049121: d9 5d fc fstp DWORD PTR [ebp-0x4] 8049124: 8d 45 f8 lea eax,[ebp-0x8] 8049127: e8 b8 ff ff ff call 80490e4 <_D3out2F1FKfZv> 804912c: 8d 45 fc lea eax,[ebp-0x4] 804912f: e8 c0 ff ff ff call 80490f4 <_D3out2F2FJfZv> 8049134: 31 c0 xor eax,eax 8049136: c9 leave 8049137: c3 ret And 0x8058100 is 0x7FA00000. As you can see out doesn't force the loading and storing of a different NaN value. Of course, maybe the compiler should skip initializing a float that gets passed into a routine as an out parameter as its first use. E.g. float a; a = 1.0; wouldn't generate two separate assignments.
Dec 23 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Pete:

 Has anyone else come across this and if so is it a bug? I'm also interested in
 people's thoughts on the strange code gen.
Please add the whole explanation in a bug report. (If you don't want to write the bug report then I'll write it myself). Bye, bearophile
Dec 24 2010