digitalmars.D - Struct method access speed

bearophile (89/89) Dec 26 2007 A tiny benchmark of mine that's a reduced version of some real code (I h...

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (20/55) Dec 26 2007 -----BEGIN PGP SIGNED MESSAGE-----

bearophile (4/8) Dec 26 2007 I don't agree. The method "create" is called only once, so it must add n...

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (17/25) Dec 26 2007 -----BEGIN PGP SIGNED MESSAGE-----

Jarrod (11/18) Dec 26 2007 I agreed with you so I tested it out. I removed the for loop in create:

Jarrod (9/29) Dec 26 2007 I also added the following method to the struct:

bearophile <bearophileHUGS lycos.com> writes:

A tiny benchmark of mine that's a reduced version of some real code (I hope my
code doesn't contain bugs/silly inefficiencies). Four compilations (gdc, dmd,
gcc, gcc aggressive):

gcc version 3.4.5 (mingw special) (gdc 0.24, using dmd 1.020)
gdc -O3 -s -frelease -finline-functions -ffast-math -fomit-frame-pointer
-funroll-loops -march=pentiumpro vec_test.d -o vec_test_gdc

dmd v1.024
dmd -O -release -inline vec_test.d

1) gcc version 3.4.2 (mingw-special)
g++ -s -O3 vec_testCpp.cpp -o vec_testCpp

2) gcc version 3.4.2 (mingw-special)
g++ -O3 -s -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops
-march=pentiumpro vec_testCpp.cpp -o vec_testCpp


Timings in seconds, N = 20_000, on Pentium 3:

dmd (D):
  3.31
  1.66

gdc (D):
  8.92
  1.39

gcc 1 (C++):
  1.67
  1.66

gcc 2 (C++):
  1.36
  1.36

The interesting results are for dmd: I think such two benchmarks have to run
for the same time, but the experiment shows it's false for DMD.

Note that this isn't an inlining problem, because if you remove the -inline
flag you obtain much worse timings.


// D ------------------------
import std.stdio, std.gc, std.c.time;

struct Vec(T, int N) {
  T* ap;

  void create() {
    ap = cast(T*)malloc(N * T.sizeof);
    for(int i; i < N; i++)
      ap[i] = T.init;
  }

  void opIndexAssign(T x, uint i) {
    ap[i] = x;
  }
}

void main() {
  const uint N = 20_000;
  alias uint T;

  auto t = clock();
  Vec!(T, N) v1;
  v1.create;
  for(uint i; i < N; i++)
    for(uint j; j < N; j++)
      v1[j] = j;
  printf("%.2f\n", cast(float)(clock()-t)/CLOCKS_PER_SEC);

  t = clock();
  auto v2 = cast(T*)malloc(N * T.sizeof);
  for(uint i; i < N; i++)
    for(uint j; j < N; j++)
      v2[j] = j;
  printf("%.2f\n", cast(float)(clock()-t)/CLOCKS_PER_SEC);
}


// C++ ------------------------
#include <time.h>
#include <stdio.h>
#include <stdlib.h>

#define N 20000

template <class T, int M> struct Vec {
  T* ap;

  void create() {
    ap = (T*)malloc(M * sizeof(T));
    for(int i = 0; i < M; i++)
      ap[i] = 0;
  }

  T &operator[](unsigned int i) {
    return ap[i];
  }
};

int main() {
  typedef unsigned int T;
  clock_t t;

  t = clock();
  Vec<T, N> v1;
  v1.create();
  for(unsigned int i = 0; i < N; i++)
    for(unsigned int j = 0; j < N; j++)
      v1[j] = j;
  printf("%.2f\n", (float)(clock()-t)/CLOCKS_PER_SEC);

  t = clock();
  T* v2 = (T*)malloc(N * sizeof(T));
  for(unsigned int i = 0; i < N; i++)
    for(unsigned int j = 0; j < N; j++)
      v2[j] = j;
  printf("%.2f\n", (float)(clock()-t)/CLOCKS_PER_SEC);
}

Bye,
bearophile

Dec 26 2007

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

bearophile wrote:
 A tiny benchmark of mine that's a reduced version of some real code (I hope my
code doesn't contain bugs/silly inefficiencies). Four compilations (gdc, dmd,
gcc, gcc aggressive):
 
 gcc version 3.4.5 (mingw special) (gdc 0.24, using dmd 1.020)
 gdc -O3 -s -frelease -finline-functions -ffast-math -fomit-frame-pointer
-funroll-loops -march=pentiumpro vec_test.d -o vec_test_gdc
 
 dmd v1.024
 dmd -O -release -inline vec_test.d
 
 1) gcc version 3.4.2 (mingw-special)
 g++ -s -O3 vec_testCpp.cpp -o vec_testCpp
 
 2) gcc version 3.4.2 (mingw-special)
 g++ -O3 -s -finline-functions -ffast-math -fomit-frame-pointer -funroll-loops
-march=pentiumpro vec_testCpp.cpp -o vec_testCpp
 
 
 Timings in seconds, N = 20_000, on Pentium 3:
 
 dmd (D):
   3.31
   1.66
 
 gdc (D):
   8.92
   1.39
 
 gcc 1 (C++):
   1.67
   1.66
 
 gcc 2 (C++):
   1.36
   1.36
 
 The interesting results are for dmd: I think such two benchmarks have to run
for the same time, but the experiment shows it's false for DMD.
 

	The one that surprises me is gdc. For DMD, you actually fill the
vector twice in the first loop (once with zeroes, then a second time
with values) and it runs about twice as long as the second loop
which is normal. However, for gdc, I can't explain the large time
for the first loop...

		Jerome
- --
+------------------------- Jerome M. BERGER ---------------------+
|    mailto:jeberger free.fr      | ICQ:    238062172            |
|    http://jeberger.free.fr/     | Jabber: jeberger jabber.fr   |
+---------------------------------+------------------------------+
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHct5cd0kWM4JG3k8RAhdoAJ9zMhKsekC8dzOi2n+UtAgYhMvyswCfQfM9
Kwxd680CMZ37Rby6isc21tE=
=dHv4
-----END PGP SIGNATURE-----

Dec 26 2007

bearophile <bearophileHUGS lycos.com> writes:

M. Berger:
 For DMD, you actually fill the
 vector twice in the first loop (once with zeroes, then a second time
 with values) and it runs about twice as long as the second loop
 which is normal.

I don't agree. The method "create" is called only once, so it must add nearly
nothing (about 1/20000) to the first timing. Please correct me if you think I
am wrong.

Bye,
bearophile

Dec 26 2007

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

bearophile wrote:
 M. Berger:
 For DMD, you actually fill the
 vector twice in the first loop (once with zeroes, then a second time
 with values) and it runs about twice as long as the second loop
 which is normal.

 
 I don't agree. The method "create" is called only once, so it must add nearly
nothing (about 1/20000) to the first timing. Please correct me if you think I
am wrong.
 

	You're right of course. I hadn't seen that. Note to self: never
post just before going to bed...

		Jerome
- --
+------------------------- Jerome M. BERGER ---------------------+
|    mailto:jeberger free.fr      | ICQ:    238062172            |
|    http://jeberger.free.fr/     | Jabber: jeberger jabber.fr   |
+---------------------------------+------------------------------+
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHc1nid0kWM4JG3k8RAq9qAKCXrUaY3scDznUZF5bgxIjgrEkMCACdEHRA
fcmQii/UkUhOXR2lBv0P3OA=
=E4Em
-----END PGP SIGNATURE-----

Dec 26 2007

Jarrod <qwerty ytre.wq> writes:

 The one that surprises me is gdc. For DMD, you actually fill the
 vector twice in the first loop (once with zeroes, then a second time
 with values) and it runs about twice as long as the second loop which is
 normal. However, for gdc, I can't explain the large time for the first
 loop...
 
 		Jerome

I agreed with you so I tested it out. I removed the for loop in create:
void create() {
  ap = cast(T*)malloc(N * T.sizeof);
}

Results:
timbus TiMBoX:~/Desktop$ dmd -O -release -inline what.d 
gcc what.o -o what -m32 -lphobos -lpthread -lm 
timbus TiMBoX:~/Desktop$ ./what 
0.37
0.19

Seems opIndexAssign needs a bit of fine tuning.

Dec 26 2007

Jarrod <qwerty ytre.wq> writes:

On Thu, 27 Dec 2007 02:02:09 +0000, Jarrod wrote:

 The one that surprises me is gdc. For DMD, you actually fill the vector
 twice in the first loop (once with zeroes, then a second time with
 values) and it runs about twice as long as the second loop which is
 normal. However, for gdc, I can't explain the large time for the first
 loop...
 
 		Jerome

 
 I agreed with you so I tested it out. I removed the for loop in create:
 void create() {
   ap = cast(T*)malloc(N * T.sizeof);
 }
 
 Results:
 timbus TiMBoX:~/Desktop$ dmd -O -release -inline what.d gcc what.o -o
 what -m32 -lphobos -lpthread -lm timbus TiMBoX:~/Desktop$ ./what
 0.37
 0.19
 
 Seems opIndexAssign needs a bit of fine tuning.

I also added the following method to the struct:

  void assign(T x, uint i){
    ap[i] = x;
  }

Then instead of using v1[i] = j; I used v1.assign(j, i);
It worked just as fast as the v2 way.

So it's not an issue with member access. It's an issue with overloading 
the array operator.

Dec 26 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Struct method access speed