www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Static arrays problem

reply bearophile <bearophileHUGS lycos.com> writes:
At the bottom of this post there are two little test programs that may show a
problem with static arrays.

D code compiled with DMD 1.038:
dmd test_d.d
Compile time: 8.29 s
Resulting exe and obj:
17.756.188 test_d.exe
     2.390 test_d.map
17.734.261 test_d.obj


C code compiled with DMC 8.42n:
dmc test_c.c
Compile time: 0.04 s
Resulting exe:
 51.740 test_c.exe


C code compiled with GCC (4.2.1):
gcc test_c.c -o test_c
Compile time: 0.16 s
Resulting exe:
    16.781 test_c.exe


C code compiled with LLVM-GCC (2.5):
llvm-gcc test_c.c -o test_c
Compile time: 0.11 s
Resulting exe:
    13.462 test_c.exe


(The following code is just reduced toy, but the full program runs up to about
13 times faster with static arrays, so I'd like to use them instead of the
dynamic ones.)

Using obj2asm on the test_d.obj it's made mostly of:

_D6test_d1aG3G65G65G130f:
	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
...

-------------------------

// C code:

#include "stdio.h"

#define N 65
#define M (N+N)

float a[3][N][N][M];
float b[4][N][N][M];
float c[N][N][M];

int main() {
    int i, j, k;

    for (i = 0; i < N; i++)
        for (j = 0; j < N; j++)
            for (k = 0; k < M; k++) {
                a[0][i][j][k] = 0.0;
                a[1][i][j][k] = 0.0;
                a[2][i][j][k] = 0.0;
                b[0][i][j][k] = 0.0;
                b[1][i][j][k] = 0.0;
                b[2][i][j][k] = 0.0;
                b[3][i][j][k] = 0.0;
                c[i][j][k] = 0.0;
            }

    printf("%f %f %f\n", a[2][10][10][10], a[2][10][10][10], c[10][10][10]);
    return 0;
}

-------------------------

// D code, it's the same,
// I have just changed the first 3 lines.

import std.c.stdio: printf;

const int N = 65;
const int M = (N+N);

float a[3][N][N][M];
float b[4][N][N][M];
float c[N][N][M];

int main() {
    int i, j, k;

    for (i = 0; i < N; i++)
        for (j = 0; j < N; j++)
            for (k = 0; k < M; k++) {
                a[0][i][j][k] = 0.0;
                a[1][i][j][k] = 0.0;
                a[2][i][j][k] = 0.0;
                b[0][i][j][k] = 0.0;
                b[1][i][j][k] = 0.0;
                b[2][i][j][k] = 0.0;
                b[3][i][j][k] = 0.0;
                c[i][j][k] = 0.0;
            }

    printf("%f %f %f\n", a[2][10][10][10], a[2][10][10][10], c[10][10][10]);
    return 0;
}

Bye,
bearophile
Dec 17 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 At the bottom of this post there are two little test programs that may show a
problem with static
arrays. Could you describe what the problem is?
 -------------------------
 // D code, it's the same,
 // I have just changed the first 3 lines.
 import std.c.stdio: printf;
 const int N = 65;
 const int M = (N+N);
 float a[3][N][N][M];
 float b[4][N][N][M];
 float c[N][N][M];
Consider changing this to: float a[3][N][N][M] = void; float b[4][N][N][M] = void; float c[N][N][M] = void; You don't need default initialization if you're just going to manually initialize to 0.0 anyway, so eliminate it. Sean
Dec 17 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Sean Kelly:
 Could you describe what the problem is?
The problem is that the D code produces an huge executable.
 Consider changing this to:
 float a[3][N][N][M] = void;
 float b[4][N][N][M] = void;
 float c[N][N][M] = void;
 You don't need default initialization if you're just going to
 manually initialize to 0.0 anyway, so eliminate it.
Thank you, that fixes the problem, the exe size is now a more normal 180.252 bytes :-) I think the D compiler has to take a look at the size of the static arrays, if they are too much large, then their initialization has to be done with a loop, avoiding the creation of such huge executables. Bye, bearophile
Dec 17 2008
next sibling parent reply BCS <ao pathlink.com> writes:
Reply to bearophile,

 the exe size is now a more normal 180.252 bytes :-)
 
 I think the D compiler has to take a look at the size of the static
 arrays, if they are too much large, then their initialization has to
 be done with a loop, avoiding the creation of such huge executables.
 
OTOH that could end up large boot times.
Dec 17 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
BCS:
 OTOH that could end up large boot times.
Why? Is a "cleaning loop" with memset (done with modern CPU instructions) slower than the loading of a 17 MB executable? Bye, bearophile
Dec 17 2008
parent reply BCS <ao pathlink.com> writes:
Reply to bearophile,

 BCS:
 
 OTOH that could end up large boot times.
 
Why? Is a "cleaning loop" with memset (done with modern CPU instructions) slower than the loading of a 17 MB executable? Bye, bearophile
It depends. If the array is not accessed all at once, the OS needs only to set up the virtual memory maps and then page fault in the pages as needed. If the array is larger than the amount of memory that the OS would like to allow the program to keep in memory it could get even worse as it will start paging stuff out before the app even gets to main() (OTOH IIRC OPTLINK won't allow that big a static array).
Dec 17 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
BCS:
 It depends. If the array is not accessed all at once, the OS needs only to 
 set up the virtual memory maps and then page fault in the pages as needed. 
 If [...]
I have timed tree versions of that little code: D without = void: first run 1.85 s, successive ones: 0.11 s D with = void: first run 0.24 s, successive ones: 0.10 s C compiled with DMC: first run: 0.10 s, successive ones: 0.06 s C compiled with GCC: first run: 0.07 s, successive ones: 0.05 s (Compilations with no arguments. GCC compiled with -s -Os). So that huge executable is bad for the speed too. Bye, bearophile
Dec 17 2008
parent BCS <ao pathlink.com> writes:
Reply to bearophile,

 BCS:
 
 It depends. If the array is not accessed all at once, the OS needs
 only to set up the virtual memory maps and then page fault in the
 pages as needed. If [...]
 
I have timed tree versions of that little code: D without = void: first run 1.85 s, successive ones: 0.11 s D with = void: first run 0.24 s, successive ones: 0.10 s C compiled with DMC: first run: 0.10 s, successive ones: 0.06 s C compiled with GCC: first run: 0.07 s, successive ones: 0.05 s (Compilations with no arguments. GCC compiled with -s -Os). So that huge executable is bad for the speed too. Bye, bearophile
That dosn't test what I was looking at. Try it without the init loop, I'll bet you won't see any difference between the cases at all.
Dec 17 2008
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 I think the D compiler has to take a look at the size of the static arrays, if
they are too much large, then their initialization has to be
done with a loop, avoiding the creation of such huge executables. TypeInfo should probably contain an init(void[]) routine instead of a void[] member which is byte copied onto the new block. Then, initializers that are easy to decompose could at least be turned into a series of memset operations. For built-in types, this should be very easy to do, since the functions could be hand-coded as opposed to requiring code generation from the compiler. Sean
Dec 17 2008
prev sibling parent "Denis Koroskin" <2korden gmail.com> writes:
On Thu, 18 Dec 2008 00:52:43 +0300, bearophile <bearophileHUGS lycos.com>  
wrote:

 At the bottom of this post there are two little test programs that may  
 show a problem with static arrays.

 D code compiled with DMD 1.038:
 dmd test_d.d
 Compile time: 8.29 s
 Resulting exe and obj:
 17.756.188 test_d.exe
      2.390 test_d.map
 17.734.261 test_d.obj


 C code compiled with DMC 8.42n:
 dmc test_c.c
 Compile time: 0.04 s
 Resulting exe:
  51.740 test_c.exe


 C code compiled with GCC (4.2.1):
 gcc test_c.c -o test_c
 Compile time: 0.16 s
 Resulting exe:
     16.781 test_c.exe


 C code compiled with LLVM-GCC (2.5):
 llvm-gcc test_c.c -o test_c
 Compile time: 0.11 s
 Resulting exe:
     13.462 test_c.exe


 (The following code is just reduced toy, but the full program runs up to  
 about 13 times faster with static arrays, so I'd like to use them  
 instead of the dynamic ones.)

 Using obj2asm on the test_d.obj it's made mostly of:

 _D6test_d1aG3G65G65G130f:
 	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
 	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
 	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
 	db	000h,000h,0c0h,07fh,000h,000h,0c0h,07fh
 ...

 -------------------------

 // C code:

 #include "stdio.h"

 #define N 65
 #define M (N+N)

 float a[3][N][N][M];
 float b[4][N][N][M];
 float c[N][N][M];

 int main() {
     int i, j, k;

     for (i = 0; i < N; i++)
         for (j = 0; j < N; j++)
             for (k = 0; k < M; k++) {
                 a[0][i][j][k] = 0.0;
                 a[1][i][j][k] = 0.0;
                 a[2][i][j][k] = 0.0;
                 b[0][i][j][k] = 0.0;
                 b[1][i][j][k] = 0.0;
                 b[2][i][j][k] = 0.0;
                 b[3][i][j][k] = 0.0;
                 c[i][j][k] = 0.0;
             }

     printf("%f %f %f\n", a[2][10][10][10], a[2][10][10][10],  
 c[10][10][10]);
     return 0;
 }

 -------------------------

 // D code, it's the same,
 // I have just changed the first 3 lines.

 import std.c.stdio: printf;

 const int N = 65;
 const int M = (N+N);

 float a[3][N][N][M];
 float b[4][N][N][M];
 float c[N][N][M];

 int main() {
     int i, j, k;

     for (i = 0; i < N; i++)
         for (j = 0; j < N; j++)
             for (k = 0; k < M; k++) {
                 a[0][i][j][k] = 0.0;
                 a[1][i][j][k] = 0.0;
                 a[2][i][j][k] = 0.0;
                 b[0][i][j][k] = 0.0;
                 b[1][i][j][k] = 0.0;
                 b[2][i][j][k] = 0.0;
                 b[3][i][j][k] = 0.0;
                 c[i][j][k] = 0.0;
             }

     printf("%f %f %f\n", a[2][10][10][10], a[2][10][10][10],  
 c[10][10][10]);
     return 0;
 }

 Bye,
 bearophile
This is a linker issue. DMD stores a, b,c AND a.init, b.init, c.init (doubling data size). Intelligent linker could strip a.init, b.init and c.init if they are never accessed.
Dec 18 2008