digitalmars.D - GDC vs dmd speed

Spacen Jasset (109/109) Oct 14 2013 Hello,

jerro (4/8) Oct 14 2013 Maybe you could do something to make the code compiled with DMD
bearophile (25/43) Oct 14 2013 The performance difference between the DMD and GDC compile is

Spacen Jasset (3/44) Oct 15 2013 Thank you. I may take up some of those suggestions. It was a direct port...

Walter Bright (4/8) Oct 14 2013 dmd uses the x87 for 32 bit code for floating point, while gdc uses the ...

Spacen Jasset (2/11) Oct 15 2013 Thanks Walter. I shall find a 64 bit system at some point to compare.

Paul Jurczak (73/77) Oct 15 2013 ....
John Colvin (5/7) Oct 15 2013 That's a really old gdc. If you can, upgrade to ubuntu 13.10 and

Spacen Jasset <spacenjasset yahoo.co.uk> writes:

Hello,

Whilst porting some C++ code I have discovered that the compiled output 
from the gdc compiler seems to be 47% quicker than the dmd compiler. The 
code I believe that is the 'busy' code is below. Although I could 
provide a complete test if anyone is interested. Is this an expected 
result and/or is there something I could change to make the compilers 
perform similarly. The function render_minecraft gets called repeatedly 
to render single frames. framebufindex is a simple function to return a 
buffer index. Perhaps it is not being inlined? rgba is another simple 
function.

further details are:

dmd32 v2.063.2
with flags: ["-O", "-release", "-noboundscheck", "-inline"]

gdc 4.6 (0.29.1-4.6.4-1ubuntu4) Which I assume might be v2.020?
with flags: ["-O2"]



// render the next frame into the given 'frame_buf'
void render_minecraft(void * private_renderer_data, uint32_t * frame_buf)
{
     render_info * info = cast(render_info *)private_renderer_data;
     const float pi = 3.14159265f;

     float dx = cast(float)(Clock.currSystemTick.length % 
(TickDuration.ticksPerSec * 10)) / (TickDuration.ticksPerSec * 10);
     float xRot = sin(dx * pi * 2) * 0.4f + pi / 2;
     float yRot = cos(dx * pi * 2) * 0.4f;
     float yCos = cos(yRot);
     float ySin = sin(yRot);
     float xCos = cos(xRot);
     float xSin = sin(xRot);

     float ox = 32.5f + dx * 64;
     float oy = 32.5f;
     float oz = 32.5f;

     for (int x = 0; x < width; ++x) {
         float ___xd = cast(float)(x - width / 2) / height;
         for (int y = 0; y < height; ++y) {
             float __yd = cast(float)(y - height / 2) / height;
             float __zd = 1;

             float ___zd = __zd * yCos + __yd * ySin;
             float _yd = __yd * yCos - __zd * ySin;

             float _xd = ___xd * xCos + ___zd * xSin;
             float _zd = ___zd * xCos - ___xd * xSin;

             uint32_t col = 0;
             uint32_t br = 255;
             float ddist = 0;
             float closest = 32;

             for (int d = 0; d < 3; ++d) {
                 float dimLength = _xd;
                 if (d == 1)
                     dimLength = _yd;
                 if (d == 2)
                     dimLength = _zd;

                 float ll = 1 / (dimLength < 0 ? -dimLength : dimLength);
                 float xd = (_xd) * ll;
                 float yd = (_yd) * ll;
                 float zd = (_zd) * ll;

                 float initial = ox - cast(int)ox;
                 if (d == 1)
                     initial = oy - cast(int)oy;
                 if (d == 2)
                     initial = oz - cast(int)oz;
                 if (dimLength > 0)
                     initial = 1 - initial;

                 float dist = ll * initial;

                 float xp = ox + xd * initial;
                 float yp = oy + yd * initial;
                 float zp = oz + zd * initial;

                 if (dimLength < 0) {
                     if (d == 0)
                         xp--;
                     if (d == 1)
                         yp--;
                     if (d == 2)
                         zp--;
                 }

                 while (dist < closest) {
                     uint tex = info.map[mapindex(xp, yp, zp)];

                     if (tex > 0) {
                         uint u = cast(uint32_t)((xp + zp) * 16) & 15;
                         uint v = (cast(uint32_t)(yp * 16) & 15) + 16;
                         if (d == 1) {
                             u = cast(uint32_t)(xp * 16) & 15;
                             v = (cast(uint32_t)(zp * 16) & 15);
                             if (yd < 0)
                                 v += 32;
                         }

                         uint32_t cc = info.texmap[u + v * 16 + tex * 
256 * 3];
                         if (cc > 0) {
                             col = cc;
                             ddist = 255 - cast(int)(dist / 32 * 255);
                             br = 255 * (255 - ((d + 2) % 3) * 50) / 255;
                             closest = dist;
                         }
                     }
                     xp += xd;
                     yp += yd;
                     zp += zd;
                     dist += ll;
                 }
             }

             const uint32_t r = cast(uint32_t)(((col >> 16) & 0xff) * br 
* ddist / (255 * 255));
             const uint32_t g = cast(uint32_t)(((col >> 8) & 0xff) * br 
* ddist / (255 * 255));
             const uint32_t b = cast(uint32_t)(((col) & 0xff) * br * 
ddist / (255 * 255));

             frame_buf[framebufindex(x, y)] = rgba(r, g, b);
         }
     }
}

Oct 14 2013

"jerro" <a a.com> writes:

 Although I could provide a complete test if anyone is 
 interested. Is this an expected result and/or is there 
 something I could change to make the compilers perform 
 similarly.

Maybe you could do something to make the code compiled with DMD
perform better, but it is not unusual for GDC to produce 
significantly faster
code than DMD.

Oct 14 2013

"bearophile" <bearophileHUGS lycos.com> writes:

Spacen Jasset:

     const float pi = 3.14159265f;

     float dx = cast(float)(Clock.currSystemTick.length % 
 (TickDuration.ticksPerSec * 10)) / (TickDuration.ticksPerSec * 
 10);
     float xRot = sin(dx * pi * 2) * 0.4f + pi / 2;
     float yRot = cos(dx * pi * 2) * 0.4f;
     float yCos = cos(yRot);
     float ySin = sin(yRot);
     float xCos = cos(xRot);
     float xSin = sin(xRot);

     float ox = 32.5f + dx * 64;
     float oy = 32.5f;
     float oz = 32.5f;

     for (int x = 0; x < width; ++x) {
         float ___xd = cast(float)(x - width / 2) / height;
         for (int y = 0; y < height; ++y) {
             float __yd = cast(float)(y - height / 2) / height;
             float __zd = 1;


The performance difference between the DMD and GDC compile is 
kind of expected for FP-heavy code. Also try the new LDC2 
compiler (ldmd2 for the same compilation switches) that sometimes 
is better than GDC.

More comments:
- There is a PI in std.math (but it's not a float);
- Add immutable/const to every variable that doesn't need to 
change. This is a good habit like washing your hands before 
eating;
- "for (int x = 0; x < width; ++x)" ==> "foreach (immutable x; 0 
.. width)";
- I suggest to avoid many leading/trailing newlines in identifier 
names;
- It's probably worth replacing all those "float" with another 
name, like "FP" and then define "alias FP = float;" at the 
beginning. So you can see how much performance you lose/gain 
using floats/doubles. In many cases in my code there is no 
difference, but float are less precise. Floats can be useful when 
you have many of them, in a struct or array. Floats can also be 
useful when you call certain numerical functions that compute 
their result by approximation, but on some CPUs sin/cos are not 
among those functions.

Bye,
bearophile

Oct 14 2013

Spacen Jasset <spacenjasset yahoo.co.uk> writes:

On 14/10/2013 22:06, bearophile wrote:
 Spacen Jasset:

     const float pi = 3.14159265f;

     float dx = cast(float)(Clock.currSystemTick.length %
 (TickDuration.ticksPerSec * 10)) / (TickDuration.ticksPerSec * 10);
     float xRot = sin(dx * pi * 2) * 0.4f + pi / 2;
     float yRot = cos(dx * pi * 2) * 0.4f;
     float yCos = cos(yRot);
     float ySin = sin(yRot);
     float xCos = cos(xRot);
     float xSin = sin(xRot);

     float ox = 32.5f + dx * 64;
     float oy = 32.5f;
     float oz = 32.5f;

     for (int x = 0; x < width; ++x) {
         float ___xd = cast(float)(x - width / 2) / height;
         for (int y = 0; y < height; ++y) {
             float __yd = cast(float)(y - height / 2) / height;
             float __zd = 1;


 The performance difference between the DMD and GDC compile is kind of
 expected for FP-heavy code. Also try the new LDC2 compiler (ldmd2 for
 the same compilation switches) that sometimes is better than GDC.

 More comments:
 - There is a PI in std.math (but it's not a float);
 - Add immutable/const to every variable that doesn't need to change.
 This is a good habit like washing your hands before eating;
 - "for (int x = 0; x < width; ++x)" ==> "foreach (immutable x; 0 ..
 width)";
 - I suggest to avoid many leading/trailing newlines in identifier names;
 - It's probably worth replacing all those "float" with another name,
 like "FP" and then define "alias FP = float;" at the beginning. So you
 can see how much performance you lose/gain using floats/doubles. In many
 cases in my code there is no difference, but float are less precise.
 Floats can be useful when you have many of them, in a struct or array.
 Floats can also be useful when you call certain numerical functions that
 compute their result by approximation, but on some CPUs sin/cos are not
 among those functions.

 Bye,
 bearophile

Thank you. I may take up some of those suggestions. It was a direct port 
of some c++ hence the style.

Oct 15 2013

Walter Bright <newshound2 digitalmars.com> writes:

On 10/14/2013 12:24 PM, Spacen Jasset wrote:
 dmd32 v2.063.2
 with flags: ["-O", "-release", "-noboundscheck", "-inline"]

 gdc 4.6 (0.29.1-4.6.4-1ubuntu4) Which I assume might be v2.020?
 with flags: ["-O2"]

dmd uses the x87 for 32 bit code for floating point, while gdc uses the SIMD 
instructions, which are faster.

For 64 bit code, dmd uses SIMD instructions too.

Oct 14 2013

Spacen Jasset <spacenjasset yahoo.co.uk> writes:

On 14/10/2013 22:22, Walter Bright wrote:
 On 10/14/2013 12:24 PM, Spacen Jasset wrote:
 dmd32 v2.063.2
 with flags: ["-O", "-release", "-noboundscheck", "-inline"]

 gdc 4.6 (0.29.1-4.6.4-1ubuntu4) Which I assume might be v2.020?
 with flags: ["-O2"]

 dmd uses the x87 for 32 bit code for floating point, while gdc uses the
 SIMD instructions, which are faster.

 For 64 bit code, dmd uses SIMD instructions too.

Thanks Walter. I shall find a 64 bit system at some point to compare.

Oct 15 2013

"Paul Jurczak" <pauljurczak yahoo.com> writes:

On Monday, 14 October 2013 at 19:24:27 UTC, Spacen Jasset wrote:
 Hello,

 Whilst porting some C++ code I have discovered that the 
 compiled output from the gdc compiler seems to be 47% quicker 
 than the dmd compiler.

....

Here is a few more data points for microbenchmarks of simple 
functions (Project Euler), which supports an observation 
(disclaimer: my microbenchmark is not a guarantee of your code 
performance, etc.) that the fastest code is produced by LDC, then 
GDC and DMD is the slowest one.

Tested on Xubuntu 13.04 64-bit Core i5 3450S 2.8GHz.

--------------------
Test 1:

// 454ns  LDC 0.11.0: ldmd2 -m64 -O -noboundscheck -inline 
-release
// 830ns  GDC 4.8.1: gdc -m64 -march=native -fno-bounds-check 
-frename-registers -frelease -O3
// 1115ns  DMD64 2.063.2: dmd -O -noboundscheck -inline -release


int e28_0(int N = 1002) {
	int diagNumber = 1;					
	int sum        = diagNumber;	

	for (int width = 2; width < N; width += 2)	
		for (int j = 0; j < 4; ++j) {			
			diagNumber += width;				
			sum        += diagNumber;			
		}

	return sum;
}

--------------------
Test 2:

// 118ms   LDC 0.11.0: ldmd2 -m64 -O -noboundscheck -inline 
-release
// 125ms   GDC 4.8.1: gdc -m64 -march=native -fno-bounds-check 
-frename-registers -frelease -O3
// 161ms   DMD64 2.063.2: dmd -O -noboundscheck -inline -release

bool isPalindrome(string s) {return equal(s, s.retro);}

int e4(int N = 1000) {
    int nMax = 0;

    foreach (uint i; 1..N)
       foreach (uint j; i..N)
          if (isPalindrome(to!string(i*j))  &&  i*j > nMax)
             nMax = i*j;

    return nMax;
}

--------------------
Test 3:

// 585us   LDC 0.11.0: ldmd2 -m64 -O -noboundscheck -inline 
-release
// 667us   GDC 4.8.1: gdc -m64 -march=native -fno-bounds-check 
-frename-registers -frelease -O3
// 853us   DMD64 2.063.2: dmd -O -noboundscheck -inline -release

int e67_0(string fileName = r"C:\Euler\data\e67.txt") {
    // Read triangle numbers from file.
    int[][] cell;

    foreach (line; splitLines(cast(char[]) read(fileName))) {
       int[] row;

       foreach (token; std.array.splitter(line))
          row ~= [to!int(token)];

       cell ~= row;
    }

    // Compute maximum value partial paths ending at each cell.
    foreach (y; 1..cell.length) {
       cell[y][0] += cell[y-1][0];

       foreach (x; 1..y)
          cell[y][x] += max(cell[y-1][x-1], cell[y-1][x]);

       cell[y][y] += cell[y-1][y-1];
    }

    // Return the maximum value terminal path.
    return cell[$-1].reduce!max;
}

--------------------
Here is the relative to LDC code speed averaged over these three 
test (larger number is slower):
LDC 1.00
GDC 1.34
DMD 1.76

Oct 15 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 14 October 2013 at 19:24:27 UTC, Spacen Jasset wrote:
 gdc 4.6 (0.29.1-4.6.4-1ubuntu4) Which I assume might be v2.020?
 with flags: ["-O2"]

That's a really old gdc. If you can, upgrade to ubuntu 13.10 and 
you'll get a more up-to-date version. Alternatively, build from 
source: http://gdcproject.org/wiki/Installation/General    It'll 
take an age to run the compilation, but it's not hard to do.

Oct 15 2013

D Programming

C/C++ Programming

Other

digitalmars.D - GDC vs dmd speed