www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - long compile time question

reply "Dan" <dbdavidson yahoo.com> writes:
The following takes nearly three minutes to compile.
The culprit is the line bar ~= B();
What is wrong with this?

Thanks,
Dan
----------------
struct B {
   const size_t SIZE = 1024*64;
   int[SIZE] x;
}

void main() {
   B[] barr;
   barr ~= B();
}
-----------------
Oct 23 2012
next sibling parent reply 1100110 <0b1100110 gmail.com> writes:
On Tue, 23 Oct 2012 22:50:46 -0500, Dan <dbdavidson yahoo.com> wrote:

 The following takes nearly three minutes to compile.
 The culprit is the line bar ~= B();
 What is wrong with this?

 Thanks,
 Dan
 ----------------
 struct B {
    const size_t SIZE = 1024*64;
    int[SIZE] x;
 }

 void main() {
    B[] barr;
    barr ~= B();
 }
 -----------------
I have the same issue on linux x64 2.060 So appending to a dynamic array isn't really that efficient. But this goes WAY over that line. I'm timing your test now. It's still going... -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Oct 23 2012
parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Wednesday, 24 October 2012 at 04:49:19 UTC, 1100110 wrote:
 The following takes nearly three minutes to compile.
 The culprit is the line bar ~= B();
 What is wrong with this?
 I have the same issue on linux x64 2.060

 So appending to a dynamic array isn't really that efficient. 
 But this goes WAY over that line.  I'm timing your test now.

 It's still going...
It appears it's all happening during copying init, why I am not sure. [code] struct B { enum SIZE = 1024 * 64; int[SIZE] x; } //same timing issue, no array involved void test(B b) {} void main() { test(B()); } [/code] I've changed the *64 to various numbers and gotten curious results. The mb is the noted (estimated) memory footprint it used during compiling. 1: 0m0.725s mb:?? 2: 0m1.414s mb:?? 4: 0m2.620s mb:28 8: 0m8.937s mb:30 16: 0m35.869s mb:34 32: 2m36.922s mb:42 64: 9m27.353s mb:56
Oct 24 2012
parent "Maxim Fomin" <maxim maxim-fomin.ru> writes:
On Wednesday, 24 October 2012 at 09:50:38 UTC, Era Scarecrow 
wrote:
 On Wednesday, 24 October 2012 at 04:49:19 UTC, 1100110 wrote:
 The following takes nearly three minutes to compile.
 The culprit is the line bar ~= B();
 What is wrong with this?
 I have the same issue on linux x64 2.060

 So appending to a dynamic array isn't really that efficient. 
 But this goes WAY over that line.  I'm timing your test now.

 It's still going...
It appears it's all happening during copying init, why I am not sure. [code] struct B { enum SIZE = 1024 * 64; int[SIZE] x; } //same timing issue, no array involved void test(B b) {} void main() { test(B()); } [/code] I've changed the *64 to various numbers and gotten curious results. The mb is the noted (estimated) memory footprint it used during compiling. 1: 0m0.725s mb:?? 2: 0m1.414s mb:?? 4: 0m2.620s mb:28 8: 0m8.937s mb:30 16: 0m35.869s mb:34 32: 2m36.922s mb:42 64: 9m27.353s mb:56
According to assembly dmd just generates repetitive instructions to zero memory instead of making it in a loop. A workaround is to initialize array to void and then zero it in a loop.
Oct 24 2012
prev sibling next sibling parent reply "thedeemon" <dlang thedeemon.com> writes:
On Wednesday, 24 October 2012 at 03:50:47 UTC, Dan wrote:
 The following takes nearly three minutes to compile.
 The culprit is the line bar ~= B();
 What is wrong with this?

 Thanks,
 Dan
 ----------------
 struct B {
   const size_t SIZE = 1024*64;
   int[SIZE] x;
 }

 void main() {
   B[] barr;
   barr ~= B();
 }
 -----------------
The code DMD generates for initializing the struct does not use loops, so it's xor ecx, ecx mov [eax], ecx mov [eax+4], ecx mov [eax+8], ecx mov [eax+0Ch], ecx mov [eax+10h], ecx mov [eax+14h], ecx mov [eax+18h], ecx mov [eax+1Ch], ecx mov [eax+20h], ecx mov [eax+24h], ecx mov [eax+28h], ecx mov [eax+2Ch], ecx mov [eax+30h], ecx mov [eax+34h], ecx mov [eax+38h], ecx ... So your code creates a lot of work for the compiler.
Oct 24 2012
next sibling parent reply Don Clugston <dac nospam.com> writes:
On 24/10/12 17:39, thedeemon wrote:
 On Wednesday, 24 October 2012 at 03:50:47 UTC, Dan wrote:
 The following takes nearly three minutes to compile.
 The culprit is the line bar ~= B();
 What is wrong with this?

 Thanks,
 Dan
 ----------------
 struct B {
   const size_t SIZE = 1024*64;
   int[SIZE] x;
 }

 void main() {
   B[] barr;
   barr ~= B();
 }
 -----------------
The code DMD generates for initializing the struct does not use loops, so it's xor ecx, ecx mov [eax], ecx mov [eax+4], ecx mov [eax+8], ecx mov [eax+0Ch], ecx mov [eax+10h], ecx mov [eax+14h], ecx mov [eax+18h], ecx mov [eax+1Ch], ecx mov [eax+20h], ecx mov [eax+24h], ecx mov [eax+28h], ecx mov [eax+2Ch], ecx mov [eax+30h], ecx mov [eax+34h], ecx mov [eax+38h], ecx ... So your code creates a lot of work for the compiler.
That's incredibly horrible, please add to bugzilla.
Oct 24 2012
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Oct 24, 2012 at 06:04:10PM +0200, Don Clugston wrote:
 On 24/10/12 17:39, thedeemon wrote:
On Wednesday, 24 October 2012 at 03:50:47 UTC, Dan wrote:
The following takes nearly three minutes to compile.
The culprit is the line bar ~= B();
What is wrong with this?

Thanks,
Dan
----------------
struct B {
  const size_t SIZE = 1024*64;
  int[SIZE] x;
}

void main() {
  B[] barr;
  barr ~= B();
}
-----------------
The code DMD generates for initializing the struct does not use loops, so it's xor ecx, ecx mov [eax], ecx mov [eax+4], ecx mov [eax+8], ecx mov [eax+0Ch], ecx mov [eax+10h], ecx mov [eax+14h], ecx mov [eax+18h], ecx mov [eax+1Ch], ecx
[...] Yikes!! Why aren't we using memset (or equivalent) here?!
 That's incredibly horrible, please add to bugzilla.
[...] Yeah, no kidding! For comparison, GDC does a way better job in this department: $ time dmd test.d real 0m7.564s user 0m7.529s sys 0m0.029s $ time gdc test.d real 0m0.107s user 0m0.069s sys 0m0.036s $ This is with SIZE = 1024*16 (I didn't dare try it with 1024*64). That's a 75:1 ratio between dmd and gdc, which is pretty horrible, since dmd is usually significantly faster than gdc. Surprisingly, though, dmd still produces a smaller executable than gdc for this code! I'm guessing the optimizer cleans up that code afterwards? (Or maybe there are other factors at play here that I'm not aware of.) T -- EMACS = Extremely Massive And Cumbersome System
Oct 24 2012
parent "thedeemon" <dlang thedeemon.com> writes:
On Wednesday, 24 October 2012 at 17:43:11 UTC, H. S. Teoh wrote:

 Surprisingly, though, dmd still produces a smaller executable 
 than gdc
 for this code! I'm guessing the optimizer cleans up that code
 afterwards? (Or maybe there are other factors at play here that 
 I'm not aware of.)
Must be other factors. "Optimized" code (generated by dmd with -release -O) looks like mov DWORD PTR [edx], 0 mov DWORD PTR [edx+4], 0 mov DWORD PTR [edx+8], 0 ... so it should be even bigger and probably slower.
Oct 24 2012
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, October 24, 2012 18:04:10 Don Clugston wrote:
 That's incredibly horrible, please add to bugzilla.
There are at least a couple of potentially related bugs already: http://d.puremagic.com/issues/show_bug.cgi?id=8828 http://d.puremagic.com/issues/show_bug.cgi?id=8449 though this may still merit a new report. - Jonathan M Davis
Oct 24 2012
prev sibling parent reply "Era Scarecrow" <rtcvb32 yahoo.com> writes:
On Wednesday, 24 October 2012 at 15:39:19 UTC, thedeemon wrote:
 The code DMD generates for initializing the struct does not use 
 loops, so it's
 xor     ecx, ecx
 mov     [eax], ecx
 mov     [eax+4], ecx
 mov     [eax+8], ecx
 ...

 So your code creates a lot of work for the compiler.
That seems silly. I would think after the struct's init/contents were known it would make a single block that holds the basic init for it and bulk copy every time it needed it (if it's beyond a certain size, say 32 bytes). Also memset only works if the data can be defaulted to 0. Hmmm...
Oct 24 2012
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Oct 25, 2012 at 12:05:25AM +0200, Era Scarecrow wrote:
 On Wednesday, 24 October 2012 at 15:39:19 UTC, thedeemon wrote:
The code DMD generates for initializing the struct does not use
loops, so it's
xor     ecx, ecx
mov     [eax], ecx
mov     [eax+4], ecx
mov     [eax+8], ecx
...

So your code creates a lot of work for the compiler.
That seems silly. I would think after the struct's init/contents were known it would make a single block that holds the basic init for it and bulk copy every time it needed it (if it's beyond a certain size, say 32 bytes). Also memset only works if the data can be defaulted to 0. Hmmm...
Not true. Memset is *usually* used to set memory to 0, but it can set memory to other byte values too. Although this doesn't help if the .init value isn't something consisting of repeated byte values. In any case, though, separately initializing every member of an array is silly. That's what a loop is for. That, or a memcpy from an immutable copy of .init. T -- Life would be easier if I had the source code. -- YHL
Oct 24 2012
parent reply "BLM768" <blm768 gmail.com> writes:
 In any case, though, separately initializing every member of an 
 array is
 silly. That's what a loop is for. That, or a memcpy from an 
 immutable
 copy of .init.
I think that the reasoning behind DMD's implementation is that for small structs, writing out the instructions is more efficient than a loop or a memcpy(); it's essentially the equivalent of loop unrolling and function inlining. However, that reasoning breaks down as soon as the struct's size goes beyond a certain value. In my opinion, though, this behavior should be kept for small structs. For example, if you have a struct that just wraps a size_t, just generating a move instruction is _way_ faster than a call to memcpy().
Oct 27 2012
parent "BLM768" <blm768 gmail.com> writes:
On Saturday, 27 October 2012 at 23:07:19 UTC, BLM768 wrote:
 In any case, though, separately initializing every member of 
 an array is
 silly. That's what a loop is for. That, or a memcpy from an 
 immutable
 copy of .init.
I think that the reasoning behind DMD's implementation is that for small structs, writing out the instructions is more efficient than a loop or a memcpy(); it's essentially the equivalent of loop unrolling and function inlining. However, that reasoning breaks down as soon as the struct's size goes beyond a certain value. In my opinion, though, this behavior should be kept for small structs. For example, if you have a struct that just wraps a size_t, just generating a move instruction is _way_ faster than a call to memcpy().
I just realized that this post is redundant; other posts have also mentioned optimization for small structs. That makes two relatively dumb posts from me in a day; maybe I should just stop for now. :)
Oct 27 2012
prev sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Dan wrote:

 The following takes nearly three minutes to compile.
... and this returns immediately: -------------------------------- struct B { const size_t SIZE = 1024*64; int[SIZE] x= void; // !!! } void main() { B[] barr; barr ~= B(); } ------------------------------- - manfred
Oct 24 2012
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Wednesday, October 24, 2012 21:35:22 Manfred Nowak wrote:
 Dan wrote:
 The following takes nearly three minutes to compile.
... and this returns immediately: -------------------------------- struct B { const size_t SIZE = 1024*64; int[SIZE] x= void; // !!! } void main() { B[] barr; barr ~= B(); } ------------------------------- - manfred
That's incredibly dangerous though, because than B.init has garbage in it. Still, it's interesting that it would affect the speed. Maybe the compiler recognizes that it's garbage and so doesn't bother copying it (meaning that you'll get different garbage for every use of B.init). - Jonathan M Davis
Oct 24 2012