www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Libdivide ported to D

reply Tomer Filiba <tomerfiliba gmail.com> writes:
https://code.dlang.org/packages/divide

Libdivide (http://libdivide.com/) allows converting the DIV 
instruction (in runtime) to a series of shifts and MULs, which is 
much more efficient in execution time. It works by taking a 
number (the divisor or "denominator") and doing some 
preprocessing to it, after which dividing by it can be ~8 times 
faster (my own measurements). I use it to divide CPU cycles by 
the CPU frequency (i.e., two big ugly numbers) to obtain wall 
time from it.

Of course it only applies to runtime division -- the compiler can 
do the same if the divisor is known in compile time.

Notes:
* It's a header-only library so I ported the code itself
* I tried to keep my port as mechanical as possible; I can't 
really say I know what's going on there
* I only ported the POSIX x86-64 code because that's what I needed
* Signes-ness is a big issue, be sure you use the right variant


-tomer
May 14 2017
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/14/2017 3:39 AM, Tomer Filiba wrote:
 https://code.dlang.org/packages/divide

 Libdivide (http://libdivide.com/) allows converting the DIV instruction (in
 runtime) to a series of shifts and MULs, which is much more efficient in
 execution time. It works by taking a number (the divisor or "denominator") and
 doing some preprocessing to it, after which dividing by it can be ~8 times
 faster (my own measurements). I use it to divide CPU cycles by the CPU
frequency
 (i.e., two big ugly numbers) to obtain wall time from it.

 Of course it only applies to runtime division -- the compiler can do the same
if
 the divisor is known in compile time.
I hate to say this, but modern compilers already do this for generated runtime code when dividing by a constant. Here's dmd's version: https://github.com/dlang/dmd/blob/master/src/ddmd/backend/divcoeff.c
May 14 2017
parent reply David Nadlinger <code klickverbot.at> writes:
On Sunday, 14 May 2017 at 15:30:19 UTC, Walter Bright wrote:
 On 5/14/2017 3:39 AM, Tomer Filiba wrote:
 Of course it only applies to runtime division -- the compiler 
 can do the same if
 the divisor is known in compile time.
I hate to say this, but modern compilers already do this for generated runtime code when dividing by a constant. […]
As Tomer points out, a runtime implementation can still be useful if the divisor is only known dynamically (but constant across number of operations). — David
May 14 2017
parent reply Jonathan M Davis via Digitalmars-d-announce writes:
On Sunday, May 14, 2017 16:20:21 David Nadlinger via Digitalmars-d-announce 
wrote:
 On Sunday, 14 May 2017 at 15:30:19 UTC, Walter Bright wrote:
 On 5/14/2017 3:39 AM, Tomer Filiba wrote:
 Of course it only applies to runtime division -- the compiler
 can do the same if
 the divisor is known in compile time.
I hate to say this, but modern compilers already do this for generated runtime code when dividing by a constant. […]
As Tomer points out, a runtime implementation can still be useful if the divisor is only known dynamically (but constant across number of operations).
Liran was telling me last year about how the folks at Weka had used this to speed up the stuff in core.time and std.datetime in their local branch and wanted me to look into updating the official implementation to use it (unfortunately, I haven't had the time to spend on D that I would have liked and haven't managed to look into that yet - though that would require putting at least some of this in druntime). I confess though that I was highly confused about the whole thing, because it sounded like this was an optimization that the compiler already did, and yet the Weka guys were having to use libdivide some portion of the time. I suppose that it makes sense though if the issue is that the divisor is not known until runtime. But unfortunately, my understanding of compiler optimizations like this is fairly poor. - Jonathan M Davis
May 15 2017
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/15/2017 3:51 AM, Jonathan M Davis via Digitalmars-d-announce wrote:
 Liran was telling me last year about how the folks at Weka had used this to
 speed up the stuff in core.time and std.datetime in their local branch and
 wanted me to look into updating the official implementation to use it
 (unfortunately, I haven't had the time to spend on D that I would have liked
 and haven't managed to look into that yet - though that would require
 putting at least some of this in druntime). I confess though that I was
 highly confused about the whole thing, because it sounded like this was an
 optimization that the compiler already did, and yet the Weka guys were
 having to use libdivide some portion of the time. I suppose that it makes
 sense though if the issue is that the divisor is not known until runtime.
 But unfortunately, my understanding of compiler optimizations like this is
 fairly poor.
One can do things like this: if (divisor == 10) foreach (i; 1..1000) result += i / 10; // compiler generates faster code here else foreach (i; 1..1000) result += i / divisor; if one knows in advance popular values of divisor. This sort of thing is already done in Phobos when converting numbers <==> strings (optimizing for radix==10).
May 15 2017