digitalmars.D.ldc - Looking for more library optimization patterns

Kai Nacke (11/11) Jan 29 2014 Hi all!

bearophile (31/40) Jan 30 2014 A de-optimization I'd really like in LDC (and dmd) is to not call

Stanislav Blinov (7/7) Jan 30 2014 Being somewhat involved in aforementioned thread, I second that.

Kai Nacke (6/6) Feb 10 2014 Thanks! That are good hints!

Andrea Fontana (5/11) Feb 11 2014 Don't know if it is the case but what about using appender
bearophile (6/7) Feb 11 2014 There are plenty of them to find. But I think you need a strategy

Ivan Kazmenko (12/22) Feb 15 2014 For short loops, an unrolled version like

bearophile (5/14) Feb 15 2014 Yes, but ldc is plenty able to unroll small loops with length

Temtaime (5/5) Apr 18 2014 What about SSE?

safety0ff (7/10) Jul 30 2014 I just learned about this issue while looking at

"Kai Nacke" <kai redstar.de> writes:

Hi all!

LDC includes the SimplifyDRuntimeCalls pass which should replace 
D library calls with more efficient ones. This is a clone of a 
former LLVM pass which did the same for C runtime calls.

An example is that printf("foo\n") is replaced by puts("foo").

Do you know of similar patterns in D which are worth to be 
implemented?
E.g. replacing std.stdio.writefln("foo") with 
std.stdio.writeln("foo")

Regards,
Kai

Jan 29 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Kai Nacke:

 LDC includes the SimplifyDRuntimeCalls pass which should 
 replace D library calls with more efficient ones. This is a 
 clone of a former LLVM pass which did the same for C runtime 
 calls.

 An example is that printf("foo\n") is replaced by puts("foo").

 Do you know of similar patterns in D which are worth to be 
 implemented?
 E.g. replacing std.stdio.writefln("foo") with 
 std.stdio.writeln("foo")

A de-optimization I'd really like in LDC (and dmd) is to not call 
the run-time functions when you perform a verctor operation on 
arrays statically known to be very short:

void main() {
     int[3] a, b;
     a[] += b[];
}

And just rewrite that with a for loop (as done for cases where 
the run time function is not available), and let ldc2 compile it 
on its own.

See a thread in D.learn:
http://forum.dlang.org/thread/lqmqsnucadaqlkxkoffc forum.dlang.org

I'd like to replace code like:


size_t i = 0;
foreach (immutable j, const ref bj; bodies) {
     foreach (const ref bk; bodies[j + 1 .. $]) {
         foreach (immutable m; TypeTuple!(0, 1, 2))
             r[i][m] = bj.x[m] - bk.x[m];
         i++;
     }
}


With:

size_t i = 0;
foreach (immutable j, const ref bj; bodies)
     foreach (const ref bk; bodies[j + 1 .. $])
         r[i++][] = bj.x[] - bk.x[];


And keep the same performance, instead of seeing a significant 
slowdown.

Bye,
bearophile

Jan 30 2014

"Stanislav Blinov" <stanislav.blinov gmail.com> writes:

Being somewhat involved in aforementioned thread, I second that. 
There seems to be no logical reason to insert a call 
dSliceOpAssignOrWhateverItIsCalled when all the data is actually 
there. At least, in -release -O3 builds :)

Granted, this may be tricky for arbitrarily-typed arrays, but for 
builtins with statically known array sizes? That's a definitive 
win.

Jan 30 2014

"Kai Nacke" <kai redstar.de> writes:

Thanks! That are good hints!
I am not sure if I can implement them soon. But I put them on my 
list for the pass.

If you have more of those hints: please post!

Regards,
Kai

Feb 10 2014

"Andrea Fontana" <nospam example.com> writes:

On Monday, 10 February 2014 at 19:51:33 UTC, Kai Nacke wrote:
 Thanks! That are good hints!
 I am not sure if I can implement them soon. But I put them on 
 my list for the pass.

 If you have more of those hints: please post!

 Regards,
 Kai

Don't know if it is the case but what about using appender 
instead of concatenation (for example for concatenations inside 
loops...)?

Andrea

Feb 11 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Kai Nacke:

 If you have more of those hints: please post!

There are plenty of them to find. But I think you need a strategy 
like this: http://en.wikipedia.org/wiki/Brainstorming  to find 
them. D conferences should be good for this purpose.

Bye,
bearophile

Feb 11 2014

"Ivan Kazmenko" <gassa mail.ru> writes:

On Thursday, 30 January 2014 at 18:36:09 UTC, bearophile wrote:
 A de-optimization I'd really like in LDC (and dmd) is to not 
 call the run-time functions when you perform a verctor 
 operation on arrays statically known to be very short:

 void main() {
     int[3] a, b;
     a[] += b[];
 }

 And just rewrite that with a for loop (as done for cases where 
 the run time function is not available), and let ldc2 compile 
 it on its own.

For short loops, an unrolled version like
     a[0] += b[0];
     a[1] += b[1];
     a[2] += b[2];
may well be faster than a simple loop as the following one:
     foreach (immutable i; 0..3) {
         a[i] += b[i];
     }
At least on x86/64.
Will that optimization happen too, at a later stage?

Ivan Kazmenko.

Feb 15 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Ivan Kazmenko:

 For short loops, an unrolled version like
     a[0] += b[0];
     a[1] += b[1];
     a[2] += b[2];
 may well be faster than a simple loop as the following one:
     foreach (immutable i; 0..3) {
         a[i] += b[i];
     }
 At least on x86/64.

Yes, but ldc is plenty able to unroll small loops with length 
known at compile time.

Bye,
bearophile

Feb 15 2014

"Temtaime" <temtaime gmail.com> writes:

What about SSE?

Can

float[16] a, b;
a[] += b[];

Be optimized by using only 4 addps ?

Apr 18 2014

"safety0ff" <safety0ff.dev gmail.com> writes:

On Thursday, 30 January 2014 at 18:36:09 UTC, bearophile wrote:
 A de-optimization I'd really like in LDC (and dmd) is to not 
 call the run-time functions when you perform a verctor 
 operation on arrays statically known to be very short:

I just learned about this issue while looking at 
http://rosettacode.org/wiki/Hamming_numbers#D
(Alternative version 2, opEquals)

It makes me sad to learn that slice / array operations get turned 
into monstrosities by the compiler.
There are some huge wins to be had in this area.

Jul 30 2014

D Programming

C/C++ Programming

Other

digitalmars.D.ldc - Looking for more library optimization patterns