www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Loop optimization

reply kai <kai nospam.zzz> writes:
Hello,

I was evaluating using D for some numerical stuff. However I was surprised to
find that looping & array indexing was not very speedy compared to
alternatives (gcc et al). I was using the DMD2 compiler on mac and windows,
with -O -release. Here is a boiled down test case:

	void main (string[] args)
	{
		double [] foo = new double [cast(int)1e6];
		for (int i=0;i<1e3;i++)
		{
			for (int j=0;j<1e6-1;j++)
			{
				foo[j]=foo[j]+foo[j+1];
			}
		}
	}

Any ideas? Am I somehow not hitting a vital compiler optimization? Thanks for
your help.
May 13 2010
next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Fri, 14 May 2010 02:38:40 +0000, kai wrote:

 Hello,
 
 I was evaluating using D for some numerical stuff. However I was
 surprised to find that looping & array indexing was not very speedy
 compared to alternatives (gcc et al). I was using the DMD2 compiler on
 mac and windows, with -O -release. Here is a boiled down test case:
 
 	void main (string[] args)
 	{
 		double [] foo = new double [cast(int)1e6]; for (int 
i=0;i<1e3;i++)
 		{
 			for (int j=0;j<1e6-1;j++)
 			{
 				foo[j]=foo[j]+foo[j+1];
 			}
 		}
 	}
 
 Any ideas? Am I somehow not hitting a vital compiler optimization?
 Thanks for your help.
Two suggestions: 1. Have you tried the -noboundscheck compiler switch? Unlike C, D checks that you do not try to read/write beyond the end of an array, but you can turn those checks off with said switch. 2. Can you use vector operations? If the example you gave is representative of your specific problem, then you can't because you are adding overlapping parts of the array. But if you are doing operations on separate arrays, then array operations will be *much* faster. http://www.digitalmars.com/d/2.0/arrays.html#array-operations As an example, compare the run time of the following code with the example you gave: void main () { double[] foo = new double [cast(int)1e6]; double[] slice1 = foo[0 .. 999_998]; double[] slice2 = foo[1 .. 999_999]; for (int i=0;i<1e3;i++) { // BAD, BAD, BAD. DON'T DO THIS even though // it's pretty awesome: slice1[] += slice2[]; } } Note that this is very bad code, since slice1 and slice2 are overlapping arrays, and there is no guarantee as to which order the array elements are computed -- it may even occur in parallel. It was just an example of the speed gains you may expect from designing your code with array operations in mind. -Lars
May 13 2010
prev sibling next sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Fri, 14 May 2010 02:38:40 +0000, kai wrote:

 Hello,
 
 I was evaluating using D for some numerical stuff. However I was
 surprised to find that looping & array indexing was not very speedy
 compared to alternatives (gcc et al). I was using the DMD2 compiler on
 mac and windows, with -O -release. Here is a boiled down test case:
 
 	void main (string[] args)
 	{
 		double [] foo = new double [cast(int)1e6]; for (int 
i=0;i<1e3;i++)
 		{
 			for (int j=0;j<1e6-1;j++)
 			{
 				foo[j]=foo[j]+foo[j+1];
 			}
 		}
 	}
 
 Any ideas? Am I somehow not hitting a vital compiler optimization?
 Thanks for your help.
Two suggestions: 1. Have you tried the -noboundscheck compiler switch? Unlike C, D checks that you do not try to read/write beyond the end of an array, but you can turn those checks off with said switch. 2. Can you use vector operations? If the example you gave is representative of your specific problem, then you can't because you are adding overlapping parts of the array. But if you are doing operations on separate arrays, then array operations will be *much* faster. http://www.digitalmars.com/d/2.0/arrays.html#array-operations As an example, compare the run time of the following code with the example you gave: void main () { double[] foo = new double [cast(int)1e6]; double[] slice1 = foo[0 .. 999_998]; double[] slice2 = foo[1 .. 999_999]; for (int i=0;i<1e3;i++) { // BAD, BAD, BAD. DON'T DO THIS even though // it's pretty awesome: slice1[] += slice2[]; } } Note that this is very bad code, since slice1 and slice2 are overlapping arrays, and there is no guarantee as to which order the array elements are computed -- it may even occur in parallel. It was just an example of the speed gains you may expect from designing your code with array operations in mind. -Lars
May 13 2010
next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Fri, 14 May 2010 06:31:29 +0000, Lars T. Kyllingstad wrote:
     void main ()
     {
         double[] foo = new double [cast(int)1e6]; double[] slice1 =
         foo[0 .. 999_998];
         double[] slice2 = foo[1 .. 999_999];
 
         for (int i=0;i<1e3;i++)
         {
             // BAD, BAD, BAD.  DON'T DO THIS even though // it's pretty
             awesome:
             slice1[] += slice2[];
         }
     }
Hmm.. something very strange is going on with the line breaking here. -Lars
May 13 2010
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 14 May 2010 02:31:29 -0400, Lars T. Kyllingstad  
<public kyllingen.nospamnet> wrote:

 On Fri, 14 May 2010 02:38:40 +0000, kai wrote:
 I was using the DMD2 compiler on
 mac and windows, with -O -release.
1. Have you tried the -noboundscheck compiler switch? Unlike C, D checks that you do not try to read/write beyond the end of an array, but you can turn those checks off with said switch.
-release implies -noboundscheck (in fact, I did not know there was a noboundscheck flag, I thought you had to use -release). -Steve
May 14 2010
parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
On Fri, 14 May 2010 07:32:54 -0400, Steven Schveighoffer wrote:

 On Fri, 14 May 2010 02:31:29 -0400, Lars T. Kyllingstad
 <public kyllingen.nospamnet> wrote:
 
 On Fri, 14 May 2010 02:38:40 +0000, kai wrote:
 I was using the DMD2 compiler on
 mac and windows, with -O -release.
1. Have you tried the -noboundscheck compiler switch? Unlike C, D checks that you do not try to read/write beyond the end of an array, but you can turn those checks off with said switch.
-release implies -noboundscheck (in fact, I did not know there was a noboundscheck flag, I thought you had to use -release). -Steve
You are right, just checked it now. But it's strange, I thought the whole point of the -noboundscheck switch was that it would be independent of -release. But perhaps I remember wrongly (or perhaps Walter just hasn't gotten around to it yet). Anyway, sorry for the misinformation. -Lars
May 14 2010
prev sibling next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
kai:

 I was evaluating using D for some numerical stuff.
For that evaluation you probably have to use the LDC compiler, that is able to optimize better.
 	void main (string[] args)
 	{
 		double [] foo = new double [cast(int)1e6];
 		for (int i=0;i<1e3;i++)
 		{
 			for (int j=0;j<1e6-1;j++)
 			{
 				foo[j]=foo[j]+foo[j+1];
 			}
 		}
 	}
Using floating point for indexes and lengths is not a good practice. In D large numbers are written like 1_000_000. Use -release too.
 Any ideas? Am I somehow not hitting a vital compiler optimization?
DMD compiler doesn't perform many optimizations, especially on floating point computations. But the bigger problem in your code is that you are performing operations on NaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. Your code in C: #include "stdio.h" #include "stdlib.h" #define N 1000000 int main() { double *foo = calloc(N, sizeof(double)); // malloc suffices here int i, j; for (j = 0; j < N; j++) foo[j] = 1.0; for (i = 0; i < 1000; i++) for (j = 0; j < N-1; j++) foo[j] = foo[j] + foo[j + 1]; printf("%f", foo[N-1]); return 0; } /* gcc -O3 -s -Wall test.c -o test Timings, outer loop=1_000 times: 7.72 ------------------ gcc -Wall -O3 -fomit-frame-pointer -msse3 -march=native test.c -o test (Running on a VirtualBox) Timings, outer loop=1_000 times: 7.69 s Just the inner loop: .L7: fldl 8(%edx) fadd %st, %st(1) fxch %st(1) fstpl (%edx) addl $8, %edx cmpl %ecx, %edx jne .L7 */ -------------------- Your code in D1: version (Tango) import tango.stdc.stdio: printf; else import std.c.stdio: printf; void main() { const int N = 1_000_000; double[] foo = new double[N]; foo[] = 1.0; for (int i = 0; i < 1_000; i++) for (int j = 0; j < N-1; j++) foo[j] = foo[j] + foo[j + 1]; printf("%f", foo[N-1]); } /* dmd -O -release -inline test.d (Not running on a VirtualBox) Timings, outer loop=1_000 times: 9.35 s Just the inner loop: L34: fld qword ptr 8[EDX*8][ECX] fadd qword ptr [EDX*8][ECX] fstp qword ptr [EDX*8][ECX] inc EDX cmp EDX,0F423Fh jb L34 ----------------------- ldc -O3 -release -inline test.d (Running on a VirtualBox) Timings, outer loop=1_000 times: 7.87 s Just the inner loop: .LBB1_2: movsd (%eax,%ecx,8), %xmm0 addsd 8(%eax,%ecx,8), %xmm0 movsd %xmm0, (%eax,%ecx,8) incl %ecx cmpl $999999, %ecx jne .LBB1_2 ----------------------- ldc -unroll-allow-partial -O3 -release -inline test.d (Running on a VirtualBox) Timings, outer loop=1_000 times: 7.75 s Just the inner loop: .LBB1_2: movsd (%eax,%ecx,8), %xmm0 addsd 8(%eax,%ecx,8), %xmm0 movsd %xmm0, (%eax,%ecx,8) movsd 8(%eax,%ecx,8), %xmm0 addsd 16(%eax,%ecx,8), %xmm0 movsd %xmm0, 8(%eax,%ecx,8) movsd 16(%eax,%ecx,8), %xmm0 addsd 24(%eax,%ecx,8), %xmm0 movsd %xmm0, 16(%eax,%ecx,8) movsd 24(%eax,%ecx,8), %xmm0 addsd 32(%eax,%ecx,8), %xmm0 movsd %xmm0, 24(%eax,%ecx,8) movsd 32(%eax,%ecx,8), %xmm0 addsd 40(%eax,%ecx,8), %xmm0 movsd %xmm0, 32(%eax,%ecx,8) movsd 40(%eax,%ecx,8), %xmm0 addsd 48(%eax,%ecx,8), %xmm0 movsd %xmm0, 40(%eax,%ecx,8) movsd 48(%eax,%ecx,8), %xmm0 addsd 56(%eax,%ecx,8), %xmm0 movsd %xmm0, 48(%eax,%ecx,8) movsd 56(%eax,%ecx,8), %xmm0 addsd 64(%eax,%ecx,8), %xmm0 movsd %xmm0, 56(%eax,%ecx,8) movsd 64(%eax,%ecx,8), %xmm0 addsd 72(%eax,%ecx,8), %xmm0 movsd %xmm0, 64(%eax,%ecx,8) addl $9, %ecx cmpl $999999, %ecx jne .LBB1_2 */ As you see the code generated by ldc is about as good as the one generated by gcc. There are of course other ways to optimize this code... Bye, bearophile
May 14 2010
next sibling parent reply strtr <strtr spam.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 But the bigger problem in your code is that you are performing operations on
NaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. I didn't know that. Is it the same for inf? I used it as a null for structs.
May 14 2010
parent reply Don <nospam nospam.com> writes:
strtr wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 But the bigger problem in your code is that you are performing operations on
NaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. I didn't know that. Is it the same for inf?
Yes, nan and inf are usually the same speed. However, it's very CPU dependent, and even *within* a CPU! On Pentium 4, for example, for x87, nan is 200 times slower than a normal value (!), but on Pentium 4 SSE there's no speed difference at all between nan and normal. I think there's no speed difference on AMD, but I'm not sure. There's almost no documentation on it at all.
 I used it as a null for structs.
 
May 15 2010
parent reply strtr <strtr spam.com> writes:
== Quote from Don (nospam nospam.com)'s article
 strtr wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 But the bigger problem in your code is that you are performing operations on
NaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. I didn't know that. Is it the same for inf?
Yes, nan and inf are usually the same speed. However, it's very CPU dependent, and even *within* a CPU! On Pentium 4, for example, for x87, nan is 200 times slower than a normal value (!), but on Pentium 4 SSE there's no speed difference at all between nan and normal. I think there's no speed difference on AMD, but I'm not sure. There's almost no documentation on it at all.
Thanks! NaNs being slower I can understand but inf might well be a value you want to use.
 I used it as a null for structs.
May 15 2010
parent Don <nospam nospam.com> writes:
strtr wrote:
 == Quote from Don (nospam nospam.com)'s article
 strtr wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 But the bigger problem in your code is that you are performing operations on
NaNs (that's the default initalization of FP values in D), and operations on NaNs are usually quite slower. I didn't know that. Is it the same for inf?
Yes, nan and inf are usually the same speed. However, it's very CPU dependent, and even *within* a CPU! On Pentium 4, for example, for x87, nan is 200 times slower than a normal value (!), but on Pentium 4 SSE there's no speed difference at all between nan and normal. I think there's no speed difference on AMD, but I'm not sure. There's almost no documentation on it at all.
Thanks! NaNs being slower I can understand but inf might well be a value you want to use.
Yes. What's happened is that none of the popular programming languages support special IEEE values, so they're given very low priority by chip designers. In the Pentium 4 case, they're implemented entirely in microcode. A 200X slowdown is really significant. However, the bit pattern for NaN is 0xFFFF..., which is the same as a negative integer, so an uninitialized floating-point variable has a quite high probability of being a NaN. I'm certain there's a lot of C programs out there which are inadvertantly using NaNs.
May 15 2010
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
bearophile wrote:
 kai:
 Any ideas? Am I somehow not hitting a vital compiler optimization?
DMD compiler doesn't perform many optimizations, especially on floating point computations.
More precisely: In terms of optimizations performed, DMD isn't too far behind gcc. But it performs almost no optimization on floating point. Also, the inliner doesn't yet support the newer D features (this won't be hard to fix) and the scheduler is based on Pentium1.
May 15 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
Don wrote:
 bearophile wrote:
 kai:
 Any ideas? Am I somehow not hitting a vital compiler optimization?
DMD compiler doesn't perform many optimizations, especially on floating point computations.
More precisely: In terms of optimizations performed, DMD isn't too far behind gcc. But it performs almost no optimization on floating point. Also, the inliner doesn't yet support the newer D features (this won't be hard to fix) and the scheduler is based on Pentium1.
Have to be careful when talking about floating point optimizations. For example, x/c => x * 1/c is not done because of roundoff error. Also, 0 * x => 0 is also not done because it is not a correct replacement if x is a NaN.
May 16 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 is not done because of roundoff error. Also,
     0 * x => 0
 is also not done because it is not a correct replacement if x is a NaN.
I have done a little experiment, compiling this D1 code with LDC: import tango.stdc.stdio: printf; void main(char[][] args) { double x = cast(double)args.length; double y = 0 * x; printf("%f\n", y); } I think the asm generated by ldc shows what you say: ldc -O3 -release -inline -output-s test _Dmain: pushl %ebp movl %esp, %ebp andl $-16, %esp subl $32, %esp movsd .LCPI1_0, %xmm0 movd 8(%ebp), %xmm1 orps %xmm0, %xmm1 subsd %xmm0, %xmm1 pxor %xmm0, %xmm0 mulsd %xmm1, %xmm0 movsd %xmm0, 4(%esp) movl $.str, (%esp) call printf xorl %eax, %eax movl %ebp, %esp popl %ebp ret $8 So I have added an extra "unsafe floating point" optimization: ldc -O3 -release -inline -enable-unsafe-fp-math -output-s test _Dmain: subl $12, %esp movl $0, 8(%esp) movl $0, 4(%esp) movl $.str, (%esp) call printf xorl %eax, %eax addl $12, %esp ret $8 GCC has similar switches. Bye, bearophile
May 17 2010
parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 So I have added an extra "unsafe floating point" optimization:
 
 ldc -O3 -release -inline -enable-unsafe-fp-math -output-s test
In my view, such switches are bad news, because: 1. very few people understand the issues regarding wrong floating point optimizations 2. even those that do, are faced with a switch that doesn't really define what unsafe fp optimizations it is doing, so there's no way to tell how it affects their code 3. the behavior of such a switch may change over time, breaking one's carefully written code 4. most of those optimizations can be done by hand if you want to, meaning that then their behavior will be reliable, portable and correct for your application 5. in my experience with such switches, almost nobody uses them, and the few that do use them wrongly 6. they add clutter, complexity, confusion and errors to the documentation 7. they use it, their code doesn't work correctly, they blame the compiler/language and waste the time of the tech support people
May 17 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

In my view, such switches are bad news, because:<
The Intel compiler, Microsoft compiler, GCC and LLVM have a similar switch (fp:fast in the Microsoft compiler, -ffast-math on GCC, etc). So you might send your list of comments to the devs of each of those four compilers. I have used the "unsafe fp" switch in LDC to run faster my small raytracers, with good results. So I use it now and then where max precision is not important and small errors are not going to ruin the output. I have asked the LLVM head developer to improve this optimization on LLVM, because in my opinion it's not aggressive enough, to put LLVM on par with GCC. So LDC too will probably get better on this, in future. This unsafe optimization is off on default, so if you don't like it you can avoid it. Its presence in LDC has caused zero problems to me so far in LDC (because when I need safer/more precise results I don't use it).
4. most of those optimizations can be done by hand if you want to, meaning that
then their behavior will be reliable, portable and correct for your application<
This is true for any optimization. Bye, bearophile
May 17 2010
parent Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 Walter Bright:
 
 In my view, such switches are bad news, because:<
The Intel compiler, Microsoft compiler, GCC and LLVM have a similar switch (fp:fast in the Microsoft compiler, -ffast-math on GCC, etc). So you might send your list of comments to the devs of each of those four compilers.
If I agreed with everything other vendors did with their compilers, I wouldn't have built my own <g>.
May 18 2010
prev sibling parent reply Don <nospam nospam.com> writes:
Walter Bright wrote:
 Don wrote:
 bearophile wrote:
 kai:
 Any ideas? Am I somehow not hitting a vital compiler optimization?
DMD compiler doesn't perform many optimizations, especially on floating point computations.
More precisely: In terms of optimizations performed, DMD isn't too far behind gcc. But it performs almost no optimization on floating point. Also, the inliner doesn't yet support the newer D features (this won't be hard to fix) and the scheduler is based on Pentium1.
Have to be careful when talking about floating point optimizations. For example, x/c => x * 1/c is not done because of roundoff error. Also, 0 * x => 0 is also not done because it is not a correct replacement if x is a NaN.
The most glaring limitation of the FP optimiser is that it seems to never keep values in the FP stack. So that it will often do: FSTP x FLD x instead of FST x Fixing this would probably give a speedup of ~20% on almost all FP code, and would unlock the path to further optimisation.
May 17 2010
parent BCS <none anon.com> writes:
Hello Don,

 The most glaring limitation of the FP optimiser is that it seems to
 never keep values in the FP stack. So that it will often do:
 FSTP x
 FLD x
 instead of FST x
 Fixing this would probably give a speedup of ~20% on almost all FP
 code, and would unlock the path to further optimisation.
Does DMD have the ground work for doing FP keyhole optimizations? That sound like an easy one. -- ... <IXOYE><
May 17 2010
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
bearophile wrote:
 DMD compiler doesn't perform many optimizations,
This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point. There are probably over a thousand optimizations at all levels that dmd does with integer and pointer code. Compare the generated code with and without -O. Even without -O, dmd does a long list of optimizations (such as common subexpression elimination).
May 16 2010
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 This is simply false. DMD does an excellent job with integer and pointer 
 operations. It does a so-so job with floating point.
 There are probably over a thousand optimizations at all levels that dmd does 
 with integer and pointer code.
You are of course right, I understand your feelings, I am a stupid -.- I must be more precise in my posts. You are right that surely dmd performs numerous optimizations. What I meant to say was a comparison with other compilers, particularly ldc. And even then generic words about a generic comparison aren't useful. So I am sorry. Bye, bearophile
May 16 2010
prev sibling next sibling parent Brad Roberts <braddr puremagic.com> writes:
On 5/16/2010 4:15 PM, Walter Bright wrote:
 bearophile wrote:
 DMD compiler doesn't perform many optimizations,
This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point. There are probably over a thousand optimizations at all levels that dmd does with integer and pointer code. Compare the generated code with and without -O. Even without -O, dmd does a long list of optimizations (such as common subexpression elimination).
While it's false that DMD doesn't do many optimizations. It's true that it's behind more modern compiler optimizers. I've been working to fix some of the grossly bad holes in dmd's inliner which is one are that's just obviously lacking (see bug 2008). But gcc and ldc (and likely msvc though I lack any direct knowledge) are simply a decade or so ahead. It's not a criticism of dmd or a suggestion that the priorities are in the wrong place, just a point of fact. They've got larger teams of people and are spending significant time on just improving and adding optimizations. Later, Brad
May 16 2010
prev sibling parent Joseph Wakeling <joseph.wakeling webdrake.net> writes:
On 05/17/2010 01:15 AM, Walter Bright wrote:
 bearophile wrote:
 DMD compiler doesn't perform many optimizations,
This is simply false. DMD does an excellent job with integer and pointer operations. It does a so-so job with floating point.
Interesting to note, relative to my earlier experience with D vs. C++ speed: http://www.digitalmars.com/pnews/read.php?server=news.digitalmars.com&group=digitalmars.D.learn&artnum=19567 I'll have to try and put together a no-floating-point bit of code to make a comparison. Best wishes, -- Joe
May 19 2010
prev sibling next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Thu, 13 May 2010 22:38:40 -0400, kai <kai nospam.zzz> wrote:

 Hello,

 I was evaluating using D for some numerical stuff. However I was  
 surprised to
 find that looping & array indexing was not very speedy compared to
 alternatives (gcc et al). I was using the DMD2 compiler on mac and  
 windows,
 with -O -release. Here is a boiled down test case:

 	void main (string[] args)
 	{
 		double [] foo = new double [cast(int)1e6];
 		for (int i=0;i<1e3;i++)
 		{
 			for (int j=0;j<1e6-1;j++)
 			{
 				foo[j]=foo[j]+foo[j+1];
 			}
 		}
 	}

 Any ideas? Am I somehow not hitting a vital compiler optimization?  
 Thanks for
 your help.
I figured it out. in D, the default value for doubles is nan, so you are adding countless scores of nan's which is costly for some reason (not a big floating point guy, so I'm not sure about this). In C/C++, the default value for doubles is 0. BTW, without any initialization of the array, what are you expecting the code to do? In the C++ version, I suspect you are simply adding a bunch of 0s together. Equivalent D code which first initializes the array to 0s: void main (string[] args) { double [] foo = new double [cast(int)1e6]; foo[] = 0; // probably want to change this to something more meaningful for (int i=0;i<cast(int)1e3;i++) { for (int j=0;j<cast(int)1e6-1;j++) { foo[j]+=foo[j+1]; } } } On my PC, it runs almost exactly at the same speed as the C++ version. -Steve
May 14 2010
next sibling parent reply kai <kai nospam.zzz> writes:
Thanks for the help all!

 2. Can you use vector operations?  If the example you gave is
 representative of your specific problem, then you can't because you are
 adding overlapping parts of the array.  But if you are doing operations
 on separate arrays, then array operations will be *much* faster.
Unfortunately, I don't think I will be able to. The actual code is computing norms of a sequence of points and then updating their values as needed (MLE smoothing/prediction).
 For that evaluation you probably have to use the LDC compiler, that is
 able to optimize better.
I was scared off by the warning that D 2.0 support is experimental. I realize D 2 itself is still non-production, but for academic interests industrial-strength isnt all that important if it usually works :).
 Using floating point for indexes and lengths is not a good practice.
 In D large numbers are written like 1_000_000. Use -release too.
Good to know, thanks (thats actually a great feature for scientists!).
 DMD compiler doesn't perform many optimizations, especially on floating
 point computations. But the bigger problem in your code is that you are
 performing operations on NaNs (that's the default initalization of FP
 values in D), and operations on NaNs are usually quite slower.
 in D, the default value for doubles is nan, so you are adding countless
 scores of nan's which is costly for some reason (not a big floating point
 guy, so I'm not sure about this).
Ah ha, that was it-- serves me right for trying to boil down a test case and failing miserably. I'll head back to my code now and try to find the real problem :-) At some point I removed the initialization data obviously.
May 14 2010
parent reply bearophile <bearophileHUGS lycos.com> writes:
kai:

 I was scared off by the warning that D 2.0 support is experimental.
LDC is D1 still, mostly :-( And at the moment it uses LLVM 2.6. LLVM 2.7 contains a new optimization that can improve that code some more.
 Good to know, thanks (thats actually a great feature for scientists!).
In theory D is a bit fit for numerical computations too, but there is lot of work to do still. And some parts of D design will need to be improved to help numerical code performance. From my extensive tests, if you use it correctly, D1 code compiled with LDC can be about as efficient as C code compiled with GCC or sometimes a little more efficient. ------------- Steven Schveighoffer:
 In C/C++, the default value for doubles is 0.
I think in C and C++ the default value for doubles is "uninitialized" (that is anything). Bye, bearophile
May 14 2010
next sibling parent reply =?windows-1252?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
bearophile wrote:
 kai:
=20
 I was scared off by the warning that D 2.0 support is experimental.
=20 LDC is D1 still, mostly :-( And at the moment it uses LLVM 2.6. LLVM 2.7 contains a new optimization that can improve that code some mo=
re.
=20
=20
 Good to know, thanks (thats actually a great feature for scientists!).=
=20
 In theory D is a bit fit for numerical computations too, but there is l=
ot of work to do still. And some parts of D design will need to be improv= ed to help numerical code performance.
=20
 From my extensive tests, if you use it correctly, D1 code compiled with=
LDC can be about as efficient as C code compiled with GCC or sometimes a= little more efficient.
=20
 -------------
=20
 Steven Schveighoffer:
 In C/C++, the default value for doubles is 0.
=20 I think in C and C++ the default value for doubles is "uninitialized" (=
that is anything).
=20
That depends. In C/C++, the default value for any global variable is to have all bits set to 0 whatever that means for the actual data type. The default value for local variables and malloc/new memory is "whatever was in this place in memory before" which can be anything. The default value for calloc is to have all bits to 0 as for global variables. In the OP code, the malloc will probably return memory that has never been used before, therefore probably initialized to 0 too (OS dependent). Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
May 14 2010
parent reply div0 <div0 users.sourceforge.net> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jérôme M. Berger wrote:
 	That depends. In C/C++, the default value for any global variable
 is to have all bits set to 0 whatever that means for the actual data
 type. 
No it's not, it's always uninitialized. Visual studio will initialise memory & a functions stack segment with 0xcd, but only in debug builds. In release mode you get what was already there. That used to be the case with gcc (which used 0xdeadbeef) as well unless they've changed it. - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFL7qNxT9LetA9XoXwRAnApAJ9rSzMN9dy1mxMFdBzASaESlkpvCQCfTRWO GlaukVSRKe3prjs/jXe73CU= =tgCi -----END PGP SIGNATURE-----
May 15 2010
parent reply =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
div0 wrote:
 J=E9r=F4me M. Berger wrote:
 	That depends. In C/C++, the default value for any global variable
 is to have all bits set to 0 whatever that means for the actual data
 type.=20
=20 No it's not, it's always uninitialized. =20
According to the C89 standard and onwards it *must* be initialized to 0. If it isn't then your implementation isn't standard compliant (needless to say, gcc, Visual, llvm, icc and dmc are all standard compliant, so you won't have any difficulty checking).
 Visual studio will initialise memory & a functions stack segment with
 0xcd, but only in debug builds. In release mode you get what was alread=
y
 there. That used to be the case with gcc (which used 0xdeadbeef) as wel=
l
 unless they've changed it.
=20
This does not concern global variables. Therefore the second part of my message applies, the part you didn't quote:
 The default value for local variables and malloc/new memory is
 "whatever was in this place in memory before" which can be anything.
 The default value for calloc is to have all bits to 0 as for global
 variables.
I should have added that some compiler / standard libraries allow you to have a default initialization value for debugging purpose. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
May 15 2010
parent reply div0 <div0 users.sourceforge.net> writes:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jérôme M. Berger wrote:
 div0 wrote:
 Jérôme M. Berger wrote:
 	That depends. In C/C++, the default value for any global variable
 is to have all bits set to 0 whatever that means for the actual data
 type. 
No it's not, it's always uninitialized.
According to the C89 standard and onwards it *must* be initialized to 0. If it isn't then your implementation isn't standard compliant (needless to say, gcc, Visual, llvm, icc and dmc are all standard compliant, so you won't have any difficulty checking).
Ah, I only do C++, where the standard is to not initialise. I didn't know the two specs had diverged like that. - -- My enormous talent is exceeded only by my outrageous laziness. http://www.ssTk.co.uk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.7 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iD8DBQFL7/wlT9LetA9XoXwRAtiuAKCsbvt0KXymdZV4SBNG2lMRB9MM6QCgo9pm qGbY++2jGP9W/lELsnq47Zs= =8KpC -----END PGP SIGNATURE-----
May 16 2010
next sibling parent "Jouko Koski" <joukokoskispam101 netti.fi> writes:
"div0" <div0 users.sourceforge.net> wrote:
 Jérôme M. Berger wrote:
 That depends. In C/C++, the default value for any global variable
 is to have all bits set to 0 whatever that means for the actual data
 type.
Ah, I only do C++, where the standard is to not initialise.
No, in C++ all *global or static* variables are zero-initialized. By default, stack variables are default-initialized, which means that doubles in stack can have any value (they are uninitialized). The C-function calloc is required to fill the newly allocated memory with zero bit pattern; malloc is not required to initialize anything. Fresh heap areas given by malloc may have zero bit pattern, but one should really make no assumptions on this. -- Jouko
May 16 2010
prev sibling parent =?ISO-8859-1?Q?=22J=E9r=F4me_M=2E_Berger=22?= <jeberger free.fr> writes:
div0 wrote:
 J=E9r=F4me M. Berger wrote:
 div0 wrote:
 J=E9r=F4me M. Berger wrote:
 	That depends. In C/C++, the default value for any global variable
 is to have all bits set to 0 whatever that means for the actual data=
 type.=20
No it's not, it's always uninitialized.
According to the C89 standard and onwards it *must* be initialized to 0. If it isn't then your implementation isn't standard compliant (needless to say, gcc, Visual, llvm, icc and dmc are all standard compliant, so you won't have any difficulty checking).
=20 Ah, I only do C++, where the standard is to not initialise. I didn't know the two specs had diverged like that. =20
The specs haven't diverged and C++ has mostly the same behaviour as C where global variables are concerned. The only difference is that if the global variable is a class with a constructor, then that constructor gets called after the memory is zeroed out. Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
May 16 2010
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 14 May 2010 12:40:52 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Steven Schveighoffer:
 In C/C++, the default value for doubles is 0.
I think in C and C++ the default value for doubles is "uninitialized" (that is anything).
You are probably right. All I did to figure this out is print out the first element of the array in my C++ version of kai's code. So it may be arbitrarily set to 0. -Steve
May 17 2010
prev sibling parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Steven Schveighoffer wrote:

     double [] foo = new double [cast(int)1e6];
     foo[] = 0;
I've discovered that this is the equivalent of the last line above: foo = 0; I don't see it in the spec. Is that an old or an unintended feature? Ali
May 15 2010
next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Ali =C3=87ehreli <acehreli yahoo.com> wrote:

 Steven Schveighoffer wrote:

  >     double [] foo =3D new double [cast(int)1e6];
  >     foo[] =3D 0;

 I've discovered that this is the equivalent of the last line above:

    foo =3D 0;

 I don't see it in the spec. Is that an old or an unintended feature?
Looks unintended to me. In fact (though that might be the C programmer in me doing the thinking), it looks to me like foo =3D null;. It might be related to the discussion in digitalmars.D "Is [] mandatory for array operations?". -- = Simen
May 15 2010
parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
Simen kjaeraas wrote:
 Ali Çehreli <acehreli yahoo.com> wrote:

 Steven Schveighoffer wrote:

  >     double [] foo = new double [cast(int)1e6];
  >     foo[] = 0;

 I've discovered that this is the equivalent of the last line above:

    foo = 0;

 I don't see it in the spec. Is that an old or an unintended feature?
I have to make a correction: It works with fixed-sized arrays. It does not work with the dynamic array initialization above.
 Looks unintended to me.  In fact (though that might be the
 C programmer in me doing the thinking), it looks to me like
 foo = null;. It might be related to the discussion in
 digitalmars.D "Is [] mandatory for array operations?".
Thanks, Ali
May 15 2010
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Ali Çehreli:
 I don't see it in the spec. Is that an old or an unintended feature?
It's a compiler bug, don't use that bracket less syntax in your programs. Don is fighting to fix such problems (and I have written several posts and bug reports on that stuff). Bye, bearophile
May 15 2010
prev sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
kai wrote:

 Here is a boiled down test case:
 
 	void main (string[] args)
 	{
 		double [] foo = new double [cast(int)1e6];
 		for (int i=0;i<1e3;i++)
 		{
 			for (int j=0;j<1e6-1;j++)
 			{
 				foo[j]=foo[j]+foo[j+1];
 			}
 		}
 	}
 
 Any ideas?
for (int j=0;j<1e6-1;j++) The j<1e6-1 is a floating point operation. It should be redone as an int one: j<1_000_000-1
May 21 2010
parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 for (int j=0;j<1e6-1;j++)
 
 The j<1e6-1 is a floating point operation. It should be redone as an int one:
       j<1_000_000-1
The syntax "1e6" can represent an integer value of one million as perfectly and as precisely as "1_000_000", but traditionally in many languages the exponential syntax is used to represent floating point values only, I don't know why. If the OP wants a short syntax to represent one million, this syntax can be used in D2: foreach (j; 0 .. 10^^6) Bye, bearophile
May 22 2010