www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Errors in TDPL

reply bearophile <bearophileHUGS lycos.com> writes:

I optimized things such that the commonly used path (many calls to empty,
front, and popFront) is as fast as possible. The initial work will be amortized
for most loops.<

New timings: Running time, best of 3, seconds: test1: 0.31 test1 opt: 0.07 test2: 0.31 test2 opt: 0.12 test3: 6.38 test3 opt: 0.52 test3: 3.78 (new iota) test3 opt: 0.29 (new iota) test4: 4.70 test4 opt: 1.25 test4: 2.03 (new iota) test4 opt: 0.27 (new iota) Compile times opt version, seconds: test1: 0.05 test2: 0.05 test3: 0.28 test4: 0.23 (new iota) test4: 0.29 test4: 0.24 (new iota) (The difference in compile time between 0.29 and 0.23 is probably not significant. The difference between 0.05 and 0.23 is significant.) In my opinion the non-opt runtime timings too are important, because you can't compile code every time in opt mode. The normal for/foreach gives some performance even with no optmized compilation. It's good to optimize retro() so it can recognize the iota() argument and return just another iota().
On my machine, test4 is still 2x slower than foreach_reverse in an optimized

As you see on mine in an optimized build it's a bit slower than 2x than foreach_reverse and more than 4x slower than a normal for loop.
The disassembly reveals that the compiler refuses to inline some calls, which
is disappointing as their bodies are very simple.<

The code inside the loop here is very simple, it's just a variable (register) inc. In this case, in an optimized build GCC is able to optimize away the whole for loop. In my opinion a good D2 compiler has to to do the same with a foreach(retro(iota)) :-) If I am not wrong this is the inner loop of test4 with the new iota in opt build: L3F: mov ECX,018h[ESP] inc EBX add 010h[ESP],ECX mov EDX,010h[ESP] cmp EDX,014h[ESP] jne L3F It's not very good, but it's better than before. All those memory loads and stores are not necessary in this loop, they waste time. There is a call to this function, but it's before the call: _D3std5range13__T4IotaTiTiZ4Iota6__ctorMFNciiiZS3std5range13__T4IotaTiTiZ4Iota comdat L0: sub ESP,0Ch mov ECX,offset FLAT:_D3std5range13__T4IotaTiTiZ4Iota6__ctorMFNciiiZS3std5range13__T4IotaTiTiZ4Iota14__dgliteral607MFZAxa push EBX push EBP mov EBP,020h[ESP] push ESI mov ESI,EAX push EDI mov EDI,020h[ESP] test EDI,EDI mov 014h[ESP],EAX mov 018h[ESP],ECX je L28 cmp EBP,024h[ESP] jne L2C L28: xor EDX,EDX jmp short L31 L2C: mov EDX,1 L31: xor DL,1 je L5A push dword ptr FLAT:_DATA[054h] mov EDX,ECX push dword ptr FLAT:_DATA[050h] push 084Dh mov EAX,020h[ESP] mov EBX,020h[ESP] call EDX push EDX push EAX call near ptr _D3std9contracts7bailOutFAyaixAaZv L5A: test EDI,EDI mov [ESI],EBP mov 8[ESI],EDI jle L77 mov EDX,024h[ESP] dec EDX mov EAX,EDX mov 4[ESI],EDX sub EAX,EBP cdq idiv EDI sub 4[ESI],EDX jmp short L89 L77: mov ECX,024h[ESP] inc ECX mov EAX,ECX mov 4[ESI],ECX sub EAX,EBP cdq idiv EDI add 4[ESI],EDX L89: add 4[ESI],EDI mov EAX,ESI pop EDI pop ESI pop EBP pop EBX add ESP,0Ch ret 0Ch It's a lot of code, that's the iota constructor. The _D3std9contracts7bailOutFAyaixAaZv is probably the enforce or part of it. Doubling the loop size (N), the running time becomes about double, so such call to the constructor is not taking a lot of time (because there are no loops in the constructor). ------------------- I have seen that iota() contains: this(N current, N pastLast, S step) { enforce(step != 0 && current != pastLast); Indeed this throws: import std.range; void main() { int count; foreach (x; iota(10, 10, 1)) count++; } While his Python code gives no output:
 for x in xrange(10, 10, 1): print x

 for x in xrange(20, 10, 1): print x

In my opinion an empty loop is a valid loop, just as this is valid both syntactically and logically: for (int i = 10; i < 10; i++) {} for (int i = 20; i < 10; i++) {} So I suggest to replace that line in the iota() costructor with: enforce(step != 0); And then create an empty generator if pastLast <= current (and the step is 1). Bye, bearophile
Jun 22 2010
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 06/22/2010 07:18 AM, bearophile wrote:
 So I suggest to replace that line in the iota() costructor with:

 enforce(step != 0);

 And then create an empty generator if pastLast<= current (and the step is 1).

Absolutely. I wrote that test to simplify my life, but in the final version iota should accept empty ranges. Andrei
Jun 22 2010