www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Inline assembly and Profiling

reply Matthew Dudley <pontifechs gmail.com> writes:
I'm working on a chess engine side-project, and I'm starting to 
get into profiling and optimization.

One of the optimizations I've made involves some inline assembly, 
and I ran across some apparently bizarre behavior today, and I 
just wanted to double-check that I'm not doing something wrong.

Here's the behavior boiled down:

import std.stdio;
ubyte LS1B(ulong board)
{
   asm
   {
     bsf RAX, board;
   }
}

void main()
{
   auto one = 0x939839FA;
   assert(one.LS1B == 1, "Wrong LS1B!");
}

If I run this through DMD without profiling on, it runs 
successfully, but with profiling on, the assertion fails. And in 
the actual code, it returns seeming random numbers.

Is the profiling code stomping on my toes here? Am I not allowed 
to just single instruction into RAX like this with profiling on? 
Or is this just a compiler bug?
Feb 29 2016
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Tue, 01 Mar 2016 02:30:04 +0000
schrieb Matthew Dudley <pontifechs gmail.com>:

 I'm working on a chess engine side-project, and I'm starting to 
 get into profiling and optimization.
 
 One of the optimizations I've made involves some inline assembly, 
 and I ran across some apparently bizarre behavior today, and I 
 just wanted to double-check that I'm not doing something wrong.
 
 Here's the behavior boiled down:
 
 import std.stdio;
 ubyte LS1B(ulong board)
 {
    asm
    {
      bsf RAX, board;
    }
 }
 
 void main()
 {
    auto one = 0x939839FA;
    assert(one.LS1B == 1, "Wrong LS1B!");
 }
 
 If I run this through DMD without profiling on, it runs 
 successfully, but with profiling on, the assertion fails. And in 
 the actual code, it returns seeming random numbers.
 
 Is the profiling code stomping on my toes here? Am I not allowed 
 to just single instruction into RAX like this with profiling on? 
 Or is this just a compiler bug?
I didn't check the documentation, but I believe you have to store RAX into some variable and return that when you use inline assembly. In any case you should report a bug about this. If this code is correct, then DMD assumes you implicitly set the return value inside the asm-block and profiling should save RAX. If this is not intended, then the function is missing a return statement. Alternatively you can turn this into a naked function by starting your asm-block with "naked" and adding an explicit "ret" at the end. Naked asm means that the functions only contains the instructions you have explicitly written down, circumventing the profiling instrumentation. Either way functions with DMD-style inline assembly cannot be inlined at all, which means you are better off looking into the core.bitops compiler intrinsics. Also code coverage or profiling (forgot which one) used to not work in multi-threaded code! What I typically do is compile on Linux with GDC or LDC and use an external sampling profiler such as OProfile. You will need change some optimizations in the compiler (no inlining, debug information, keep frame pointers) so function call stack can actually be reasoned about. After a profile run you can then display the result in various ways. At first these are confusing, but you'll get the hang of it after a while. For example you could display sample counts per line of code, or display a call graph which tells you the time spent in a function separated by call site. OProfile being a system profiler is not limited to your program. It can include time spent in kernel functions or just profile the whole system at once. -- Marco
Mar 05 2016