digitalmars.D.announce - Libdivide ported to D

Tomer Filiba (18/18) May 14 2017 https://code.dlang.org/packages/divide

Walter Bright (4/13) May 14 2017 I hate to say this, but modern compilers already do this for generated r...

David Nadlinger (5/11) May 14 2017 As Tomer points out, a runtime implementation can still be useful

Jonathan M Davis via Digitalmars-d-announce (15/26) May 15 2017 Liran was telling me last year about how the folks at Weka had used this...

Walter Bright (10/22) May 15 2017 One can do things like this:

Tomer Filiba <tomerfiliba gmail.com> writes:

https://code.dlang.org/packages/divide

Libdivide (http://libdivide.com/) allows converting the DIV 
instruction (in runtime) to a series of shifts and MULs, which is 
much more efficient in execution time. It works by taking a 
number (the divisor or "denominator") and doing some 
preprocessing to it, after which dividing by it can be ~8 times 
faster (my own measurements). I use it to divide CPU cycles by 
the CPU frequency (i.e., two big ugly numbers) to obtain wall 
time from it.

Of course it only applies to runtime division -- the compiler can 
do the same if the divisor is known in compile time.

Notes:
* It's a header-only library so I ported the code itself
* I tried to keep my port as mechanical as possible; I can't 
really say I know what's going on there
* I only ported the POSIX x86-64 code because that's what I needed
* Signes-ness is a big issue, be sure you use the right variant


-tomer

May 14 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 5/14/2017 3:39 AM, Tomer Filiba wrote:
 https://code.dlang.org/packages/divide

 Libdivide (http://libdivide.com/) allows converting the DIV instruction (in
 runtime) to a series of shifts and MULs, which is much more efficient in
 execution time. It works by taking a number (the divisor or "denominator") and
 doing some preprocessing to it, after which dividing by it can be ~8 times
 faster (my own measurements). I use it to divide CPU cycles by the CPU
frequency
 (i.e., two big ugly numbers) to obtain wall time from it.

 Of course it only applies to runtime division -- the compiler can do the same
if
 the divisor is known in compile time.

I hate to say this, but modern compilers already do this for generated runtime 
code when dividing by a constant. Here's dmd's version:

   https://github.com/dlang/dmd/blob/master/src/ddmd/backend/divcoeff.c

May 14 2017

David Nadlinger <code klickverbot.at> writes:

On Sunday, 14 May 2017 at 15:30:19 UTC, Walter Bright wrote:
 On 5/14/2017 3:39 AM, Tomer Filiba wrote:
 Of course it only applies to runtime division -- the compiler 
 can do the same if
 the divisor is known in compile time.

 I hate to say this, but modern compilers already do this for 
 generated runtime code when dividing by a constant. […]

As Tomer points out, a runtime implementation can still be useful 
if the divisor is only known dynamically (but constant across 
number of operations).

  — David

May 14 2017

Jonathan M Davis via Digitalmars-d-announce writes:

On Sunday, May 14, 2017 16:20:21 David Nadlinger via Digitalmars-d-announce 
wrote:
 On Sunday, 14 May 2017 at 15:30:19 UTC, Walter Bright wrote:
 On 5/14/2017 3:39 AM, Tomer Filiba wrote:
 Of course it only applies to runtime division -- the compiler
 can do the same if
 the divisor is known in compile time.

 I hate to say this, but modern compilers already do this for
 generated runtime code when dividing by a constant. […]

 As Tomer points out, a runtime implementation can still be useful
 if the divisor is only known dynamically (but constant across
 number of operations).

Liran was telling me last year about how the folks at Weka had used this to
speed up the stuff in core.time and std.datetime in their local branch and
wanted me to look into updating the official implementation to use it
(unfortunately, I haven't had the time to spend on D that I would have liked
and haven't managed to look into that yet - though that would require
putting at least some of this in druntime). I confess though that I was
highly confused about the whole thing, because it sounded like this was an
optimization that the compiler already did, and yet the Weka guys were
having to use libdivide some portion of the time. I suppose that it makes
sense though if the issue is that the divisor is not known until runtime.
But unfortunately, my understanding of compiler optimizations like this is
fairly poor.

- Jonathan M Davis

May 15 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 5/15/2017 3:51 AM, Jonathan M Davis via Digitalmars-d-announce wrote:
 Liran was telling me last year about how the folks at Weka had used this to
 speed up the stuff in core.time and std.datetime in their local branch and
 wanted me to look into updating the official implementation to use it
 (unfortunately, I haven't had the time to spend on D that I would have liked
 and haven't managed to look into that yet - though that would require
 putting at least some of this in druntime). I confess though that I was
 highly confused about the whole thing, because it sounded like this was an
 optimization that the compiler already did, and yet the Weka guys were
 having to use libdivide some portion of the time. I suppose that it makes
 sense though if the issue is that the divisor is not known until runtime.
 But unfortunately, my understanding of compiler optimizations like this is
 fairly poor.


One can do things like this:

     if (divisor == 10)
         foreach (i; 1..1000)
             result += i / 10;  // compiler generates faster code here
     else
          foreach (i; 1..1000)
             result += i / divisor;

if one knows in advance popular values of divisor. This sort of thing is
already 
done in Phobos when converting numbers <==> strings (optimizing for radix==10).

May 15 2017

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Libdivide ported to D