www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - how to benchmark pure functions?

reply ab <not_a_real_address nowhere.ab> writes:
Hi,

when trying to compare different implementations of the optimized 
builds of a pure function using benchmark from 
std.datetime.stopwatch, I get times equal to zero, I suppose 
because the functions are not executed as they do not have side 
effects.

The same happens with the example from the documentation:
https://dlang.org/library/std/datetime/stopwatch/benchmark.html

How can I prevent the compiler from removing the code I want to 
measure? Is there some utility in the standard library or pragma 
that I should use?

Thanks

AB
Oct 27 2022
next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:
 Hi,

 when trying to compare different implementations of the 
 optimized builds of a pure function using benchmark from 
 std.datetime.stopwatch, I get times equal to zero, I suppose 
 because the functions are not executed as they do not have side 
 effects.

 The same happens with the example from the documentation:
 https://dlang.org/library/std/datetime/stopwatch/benchmark.html

 How can I prevent the compiler from removing the code I want to 
 measure? Is there some utility in the standard library or 
 pragma that I should use?

 Thanks

 AB
Sorry, I don't understand what you're saying. The examples work for me. Can you provide an exact code example which does not work as expected for you?
Oct 27 2022
parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Thu, Oct 27, 2022 at 06:20:10PM +0000, Imperatorn via Digitalmars-d-learn
wrote:
 On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:
 Hi,
 
 when trying to compare different implementations of the optimized
 builds of a pure function using benchmark from
 std.datetime.stopwatch, I get times equal to zero, I suppose because
 the functions are not executed as they do not have side effects.
 
 The same happens with the example from the documentation:
 https://dlang.org/library/std/datetime/stopwatch/benchmark.html
 
 How can I prevent the compiler from removing the code I want to
 measure?  Is there some utility in the standard library or pragma
 that I should use?
[...] To prevent the optimizer from eliding the function completely, you need to do something with the return value. Usually, this means you combine the return value into some accumulating variable, e.g., if it's an int function, have a running int accumulator that you add to: int funcToBeMeasured(...) pure { ... } int accum; auto results = benchmark!({ // Don't just call funcToBeMeasured and ignore the value // here, otherwise the optimizer may delete the call // completely. accum += funcToBeMeasured(...); }); Then at the end of the benchmark, do something with the accumulated value, like print out its value to stdout, so that the optimizer doesn't notice that the value is unused, and decide to kill all previous assignments to it. Something like `writeln(accum);` at the end should do the trick. T -- Indifference will certainly be the downfall of mankind, but who cares? -- Miquel van Smoorenburg
Oct 27 2022
prev sibling next sibling parent reply Dennis <dkorpel gmail.com> writes:
On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:
 How can I prevent the compiler from removing the code I want to 
 measure?
With many C compilers, you can use volatile assembly blocks for that. With LDC -O3, a regular assembly block also does the trick currently: ```D void main() { import std.datetime.stopwatch; import std.stdio: write, writeln, writef, writefln; import std.conv : to; void f0() {} void f1() { foreach(i; 0..4_000_000) { // nothing, loop gets optimized out } } void f2() { foreach(i; 0..4_000_000) { // defeat optimizations asm safe pure nothrow nogc {} } } auto r = benchmark!(f0, f1, f2)(1); writeln(r[0]); // 4 μs writeln(r[1]); // 4 μs writeln(r[2]); // 1 ms } ```
Oct 27 2022
parent max haughton <maxhaton gmail.com> writes:
On Thursday, 27 October 2022 at 18:41:36 UTC, Dennis wrote:
 On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:
 How can I prevent the compiler from removing the code I want 
 to measure?
With many C compilers, you can use volatile assembly blocks for that. With LDC -O3, a regular assembly block also does the trick currently: ```D void main() { import std.datetime.stopwatch; import std.stdio: write, writeln, writef, writefln; import std.conv : to; void f0() {} void f1() { foreach(i; 0..4_000_000) { // nothing, loop gets optimized out } } void f2() { foreach(i; 0..4_000_000) { // defeat optimizations asm safe pure nothrow nogc {} } } auto r = benchmark!(f0, f1, f2)(1); writeln(r[0]); // 4 μs writeln(r[1]); // 4 μs writeln(r[2]); // 1 ms } ```
I recommend a volatile data dependency rather than injecting volatile ASM into code FYI i.e. don't modify the pure function but rather make sure the result is actually used in the eyes of the compiler.
Oct 29 2022
prev sibling parent reply ab <not_a_real_address nowhere.ab> writes:
On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:
 Hi,

 when trying to compare different implementations of the 
 optimized builds of a pure function using benchmark from 
 std.datetime.stopwatch, I get times equal to zero, I suppose 
 because the functions are not executed as they do not have side 
 effects.

 The same happens with the example from the documentation:
 https://dlang.org/library/std/datetime/stopwatch/benchmark.html

 How can I prevent the compiler from removing the code I want to 
 measure? Is there some utility in the standard library or 
 pragma that I should use?

 Thanks

 AB
Thanks to H.S. Teoh and Dennis for the suggestions, they both work. I like the empty asm block a bit more because it is less invasive, but it only works with ldc. Imperatorn see Dennis code for an example. std.datetime.benchmark works, but at high optimization level (-O2, -O3) the loop can be removed and the time brought down to 0hnsec. E.g. try "ldc2 -O3 -run dennis.d". AB
Oct 28 2022
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Friday, 28 October 2022 at 09:48:14 UTC, ab wrote:
 On Thursday, 27 October 2022 at 17:17:01 UTC, ab wrote:
 [...]
Thanks to H.S. Teoh and Dennis for the suggestions, they both work. I like the empty asm block a bit more because it is less invasive, but it only works with ldc. Imperatorn see Dennis code for an example. std.datetime.benchmark works, but at high optimization level (-O2, -O3) the loop can be removed and the time brought down to 0hnsec. E.g. try "ldc2 -O3 -run dennis.d". AB
Yeah I didn't read carefully enough sorry 🌷
Oct 28 2022
prev sibling parent Siarhei Siamashka <siarhei.siamashka gmail.com> writes:
On Friday, 28 October 2022 at 09:48:14 UTC, ab wrote:
 Thanks to H.S. Teoh and Dennis for the suggestions, they both 
 work. I like the empty asm block a bit more because it is less 
 invasive, but it only works with ldc.
I used the volatileLoad/volatileStore functions to ensure that the compiler doesn't find a way to optimize out the code (for example, move repetitive calculations out of the loop or even do them at compile time) and the RDTSC/RDTSCP instruction via inline assembly for measurements: https://gist.github.com/ssvb/5c926ed9bc755900fdaac3b71a0f7cfd The goal was to have a very fast way to check (with no measurable overhead) whether reasonable optimization options had been supplied to the compiler.
Oct 28 2022