www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Default initialization of static array faster than void initialization

reply wolframw <wolframw protonmail.com> writes:
Hi,

Chapter 12.15.2 of the spec explains that void initialization of 
a static array can be faster than default initialization. This 
seems logical because the array entries don't need to be set to 
NaN. However, when I ran some tests for my matrix implementation, 
it seemed that the default-initialized array is quite a bit 
faster.

The code (and the disassembly) is at 
https://gist.github.com/wolframw/73f94f73a822c7593e0a7af411fa97ac

I compiled with dmd -O -inline -release -noboundscheck -mcpu=avx2 
and ran the tests with the m array being default-initialized in 
one run and void-initialized in another run.
The results:
Default-initialized: 245 ms, 495 μs, and 2 hnsecs
Void-initialized: 324 ms, 697 μs, and 2 hnsecs

What the heck?
I've also inspected the disassembly and found an interesting 
difference in the benchmark loop (annotated with "start of loop" 
and "end of loop" in both disassemblies). It seems to me like the 
compiler partially unrolled the loop in both cases, but in the 
default-allocation case it discards every second result of the 
multiplication and saves each other result to the sink matrix. In 
the void-initialized version, it seems like each result is stored 
in the sink matrix.
I don't see how such a difference can be caused by the different 
initialization strategies. Is there something I'm not considering?
Also, if the compiler is smart enough to figure out that it can 
discard some of the results, why doesn't it just do away with the 
entire loop and run the multiplication only once? Since both 
input matrices are immutable and opBinary is pure, it is 
guaranteed that the result is always the same, isn't it?
Nov 08 2019
next sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
One correction: I think you mean "using" a default-initialized array is 
faster (not the initialization itself).

Another observation: dmd -O makes both cases slower! Hm?

Ali
Nov 08 2019
prev sibling parent kinke <noone nowhere.com> writes:
On Friday, 8 November 2019 at 16:49:37 UTC, wolframw wrote:
 I compiled with dmd -O -inline -release -noboundscheck 
 -mcpu=avx2 and ran the tests with the m array being 
 default-initialized in one run and void-initialized in another 
 run.
 The results:
 Default-initialized: 245 ms, 495 μs, and 2 hnsecs
 Void-initialized: 324 ms, 697 μs, and 2 hnsecs
If you care about performance, you're much better off with LDC or GDC. DMD v2.089 takes 11.7 ms on my Win64 machine, LDC (`ldc2 -O -run gist.d`) v1.18 0.27 ms - that's 43x faster.
Nov 08 2019