www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - -noboundscheck

reply "Nvirjskly" <nvirjskly gmail.com> writes:
Compiling my code with the -noboundscheck flag sped it up by 
almost 5 times (whilst passing all tests and working exactly the 
same way,) is bounds checking really that expensive, and what 
other simple optimisations can I preform other than -inline -O 
-noboundscheck?
Aug 19 2012
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
Nvirjskly:

 is bounds checking really that expensive,
The D front-end is very dumb in this, as far as I know it makes no attempts to remove those tests where they can't fail. Walter believes such optimizations don't gain much.
 what other simple optimisations can I preform other than 
 -inline -O -noboundscheck?
Compiler options change across different compilers. What compiler are you using? Bye, bearophile
Aug 19 2012
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, August 19, 2012 21:29:38 Nvirjskly wrote:
 Compiling my code with the -noboundscheck flag sped it up by
 almost 5 times (whilst passing all tests and working exactly the
 same way,) is bounds checking really that expensive, and what
 other simple optimisations can I preform other than -inline -O
 -noboundscheck?
It would depend entirely on your code. In most cases, I wouldn't expect to see a speed up anywhere near that large. But if you're constantly accessing arrays and doing little other computation, then maybe you do. I have no idea what your code is doing. dmd's optimizer isn't the best anyway. It compiles much faster than gdc and ldc do, but it usually generates slower code (the focus on dmd has generally been getting everything working correctly rather than optimizing everything to death, though that should change with time). Whatever the situation with your code is, I'd expect that that the situation with its optimizations would change quite a bit with one of the other D compilers. - Jonathan M Davis
Aug 19 2012
parent reply "Nvirjskly" <nvirjskly gmail.com> writes:
On Sunday, 19 August 2012 at 20:07:32 UTC, Jonathan M Davis wrote:
 On Sunday, August 19, 2012 21:29:38 Nvirjskly wrote:
 Compiling my code with the -noboundscheck flag sped it up by
 almost 5 times (whilst passing all tests and working exactly 
 the
 same way,) is bounds checking really that expensive, and what
 other simple optimisations can I preform other than -inline -O
 -noboundscheck?
It would depend entirely on your code. In most cases, I wouldn't expect to see a speed up anywhere near that large. But if you're constantly accessing arrays and doing little other computation, then maybe you do. I have no idea what your code is doing.
I am using dmd. Yes, my code is extremely array heavy with many array-based computations (Shame-less plug: https://github.com/Nvirjskly/cryptod)
 dmd's optimizer isn't the best anyway. It compiles much faster 
 than gdc and
 ldc do, but it usually generates slower code (the focus on dmd 
 has generally
 been getting everything working correctly rather than 
 optimizing everything to
 death, though that should change with time). Whatever the 
 situation with your
 code is, I'd expect that that the situation with its 
 optimizations would
 change quite a bit with one of the other D compilers.

 - Jonathan M Davis
Ah, that makes a lot of sense. If my goal is a fast running time would it then make sense to use another compiler? I heard that gdc development is lagging behind and that ldc might not even support D2 all that well?
Aug 19 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Sunday, August 19, 2012 22:13:15 Nvirjskly wrote:
 Ah, that makes a lot of sense. If my goal is a fast running time
 would it then make sense to use another compiler? I heard that
 gdc development is lagging behind and that ldc might not even
 support D2 all that well?
Both gdc and ldc support D2, though sometimes they're a relesae behind (especially right after a new dmd release). I don't remember the sites for them, but I do recall that one or both of them have had issues where their old site is generally the first one that you find, so it looks like they don't support D2. But that's an issue with hits and google, not the compiler's themselves. But if you want your code to be as fast as possible, then use either gdc or ldc, though I don't know which is better (it probably depends on your code). - Jonathan M Davis
Aug 19 2012
prev sibling parent reply 1100110 <10equals2 gmail.com> writes:
I have gdc, dmd, and ldc installed on my computer.

I also forked your repo two minutes before reading this.


Tell me what you want, and Ill run whatever tests you want.
But in return, I'm stealing your whirlpool.(with attribution of course.)
Aug 19 2012
parent reply "Nvirjskly" <nvirjskly gmail.com> writes:
On Sunday, 19 August 2012 at 21:11:13 UTC, 1100110 wrote:
 I have gdc, dmd, and ldc installed on my computer.

 I also forked your repo two minutes before reading this.


 Tell me what you want, and Ill run whatever tests you want.
 But in return, I'm stealing your whirlpool.(with attribution of 
 course.)
Haha I actually do not have whirlpool implemented yet (it's an empty file,) but since you seem to want it, it's right at the top of my TODO list (if I'm lucky I'll get it done by the end of today, but best bet is this time tomorrow. I already have the spec open.) benchmark.d contains a main function that runs some rudimentary benchmarks if you want to compile it with that... import std.process, std.stdio, std.file, std.path; void main() { string files = ""; foreach (string name; dirEntries("src", SpanMode.breadth)) { if(name.isFile()) files ~= name ~ " "; } string command = "dmd " ~ files ~ "benchmark.d -ofcryptod -noboundscheck -O -release -inline"; writeln(shell(command)); } should compile that with dmd, I'm not sure about ldc or gdc and their compiler options, but it should be something similar...
Aug 19 2012
next sibling parent 1100110 <10equals2 gmail.com> writes:
Yeah, I figured it out.  I did have to rename src though...

I ran a few tests, inconclusive for any serious difference.
gdc is now compiling with -O3 -march=native -frelease -fno-bounds-check  
-finline -ffast-math.

But no, dmd has the shortest compile times, gdmd the longest.
I'm timing everything right now.

...My laptop is getting hot...

I want to see how bad it crashes.  =P
On Sun, 19 Aug 2012 17:17:02 -0500, Nvirjskly <nvirjskly gmail.com> wrote:

 On Sunday, 19 August 2012 at 21:11:13 UTC, 1100110 wrote:
 I have gdc, dmd, and ldc installed on my computer.

 I also forked your repo two minutes before reading this.


 Tell me what you want, and Ill run whatever tests you want.
 But in return, I'm stealing your whirlpool.(with attribution of course.)
Haha I actually do not have whirlpool implemented yet (it's an empty file,) but since you seem to want it, it's right at the top of my TODO list (if I'm lucky I'll get it done by the end of today, but best bet is this time tomorrow. I already have the spec open.) benchmark.d contains a main function that runs some rudimentary benchmarks if you want to compile it with that... import std.process, std.stdio, std.file, std.path; void main() { string files = ""; foreach (string name; dirEntries("src", SpanMode.breadth)) { if(name.isFile()) files ~= name ~ " "; } string command = "dmd " ~ files ~ "benchmark.d -ofcryptod -noboundscheck -O -release -inline"; writeln(shell(command)); } should compile that with dmd, I'm not sure about ldc or gdc and their compiler options, but it should be something similar...
-- Using Opera's revolutionary email client: http://www.opera.com/mail/
Aug 19 2012
prev sibling parent reply 1100110 <10equals2 gmail.com> writes:
Here are my results!  iirc -release implies -noboundscheck..
Also I am on x64, and these files only compile to 32bit. So there could be  
performance missing there.

rdmd --force -I../ -m32 -O -inline -release benchmark.d
26.00s user 0.23s system 99% cpu 26.386 total
---
2048 md2 in 1003 milliseconds: 15.9521 Mib/s
32768 md4 in 682 milliseconds: 375.367 Mib/s
32768 md5 in 426 milliseconds: 600.939 Mib/s
8192 ripemd160 in 779 milliseconds: 82.1566 Mib/s
4096 sha1 in 276 milliseconds: 115.942 Mib/s
16777216 ints generated by mersenne twister in 1146 milliseconds: 446.771  
Mib/s
256 ints generated by BlumBlumShub in 812 milliseconds: 0.00962131 Mib/s
1048576 texts blowfish encrypted in 645 milliseconds: 99.2248 Mib/s
65536 texts threefish encrypted in 2774 milliseconds: 5.76784 Mib/s
131072 texts AES128 encrypted in 896 milliseconds: 17.8571 Mib/s

rdmd --force -I../ -m32 benchmark.d
16.79s user 0.19s system 99% cpu 17.048 total
---
2048 md2 in 1546 milliseconds: 10.3493 Mib/s
32768 md4 in 1240 milliseconds: 206.452 Mib/s
32768 md5 in 1558 milliseconds: 164.313 Mib/s
8192 ripemd160 in 1535 milliseconds: 41.6938 Mib/s
4096 sha1 in 616 milliseconds: 51.9481 Mib/s
16777216 ints generated by mersenne twister in 1510 milliseconds: 339.073  
Mib/s
256 ints generated by BlumBlumShub in 816 milliseconds: 0.00957414 Mib/s
1048576 texts blowfish encrypted in 1094 milliseconds: 58.5009 Mib/s
65536 texts threefish encrypted in 3316 milliseconds: 4.82509 Mib/s
131072 texts AES128 encrypted in 1945 milliseconds: 8.22622 Mib/s


(ldc && gdc REALLY hate building 32bit code...)


rdmd --compiler=ldmd2 --force -I../ -m32 -O -release -noboundscheck  
benchmark.d
2048 md2 in 570 milliseconds: 28.0702 Mib/s
32768 md4 in 765 milliseconds: 334.641 Mib/s
32768 md5 in 840 milliseconds: 304.762 Mib/s
8192 ripemd160 in 571 milliseconds: 112.084 Mib/s
4096 sha1 in 263 milliseconds: 121.673 Mib/s
16777216 ints generated by mersenne twister in 747 milliseconds: 685.408  
Mib/s
core.exception.AssertError /build/src/ldc-build/runtime/phobos/std/internal/math/
iguintcore.d(2044):  
Assertion failure

real 0m8.957s
user 0m8.499s
sys 0m0.387s


rdmd --compiler=ldmd2 --force -I../ -m32 benchmark.d
2048 md2 in 2680 milliseconds: 5.97015 Mib/s
32768 md4 in 2088 milliseconds: 122.605 Mib/s
32768 md5 in 2465 milliseconds: 103.854 Mib/s
8192 ripemd160 in 2051 milliseconds: 31.2043 Mib/s
4096 sha1 in 742 milliseconds: 43.1267 Mib/s
16777216 ints generated by mersenne twister in 1580 milliseconds: 324.051  
Mib/s
core.exception.AssertError /build/src/ldc-build/runtime/phobos/std/internal/math/
iguintcore.d(2044):  
Assertion failure

real 0m14.722s
user 0m14.412s
sys 0m0.230s

I think gdc died...
binary /usr/lib/gcc/x86_64-unknown-linux-gnu/4.8.0/cc1d
version v2.059
parse benchmark
importall benchmark
import import import import import import import import import import  
import import import impo
rt import import import import import import import import import import  
import import import
import import import import import import import import import import  
import import import im
port import import import import import import import import import import  
import import import
import import import import import import import import import import  
import import import
import import import import import import import import import import  
import import import impo
rt import import import import import import import import import import  
import import import
import import import import import import import import import import  
import semantic benchmark
import import semantic2 benchmark
semantic3 benchmark
import import code benchmark
/usr/bin/ld: cannot find -lgphobos2
collect2: error: ld returned 1 exit status

real 0m15.950s
user 0m15.629s
sys 0m0.190s


I managed to force dmd and (partial) ldc builds for -m64
rdmd --force -O -m64 -release -noboundscheck -I../ benchmark.d 14.29s user  
0.19s system 99% cpu 14.553 total
2048 md2 in 1026 milliseconds: 15.5945 Mib/s
32768 md4 in 737 milliseconds: 347.354 Mib/s
32768 md5 in 1078 milliseconds: 237.477 Mib/s
8192 ripemd160 in 922 milliseconds: 69.4143 Mib/s
4096 sha1 in 309 milliseconds: 103.56 Mib/s
16777216 ints generated by mersenne twister in 1079 milliseconds: 474.513  
Mib/s
256 ints generated by BlumBlumShub in 3661 milliseconds: 0.00213398 Mib/s
1048576 texts blowfish encrypted in 593 milliseconds: 107.926 Mib/s
65536 texts threefish encrypted in 2376 milliseconds: 6.73401 Mib/s
131072 texts AES128 encrypted in 874 milliseconds: 18.3066 Mib/s

2048 md2 in 587 milliseconds: 27.2572 Mib/s
32768 md4 in 675 milliseconds: 379.259 Mib/s
32768 md5 in 752 milliseconds: 340.426 Mib/s
8192 ripemd160 in 539 milliseconds: 118.738 Mib/s
4096 sha1 in 236 milliseconds: 135.593 Mib/s
16777216 ints generated by mersenne twister in 684 milliseconds: 748.538  
Mib/s
core.exception.AssertError /build/src/ldc-build/runtime/phobos/std/internal/math/
iguintcore.d(2044):  
Assertion failure


dmd -O -release -m64 -noboundscheck
2048 md2 in 1079 milliseconds: 14.8285 Mib/s
32768 md4 in 804 milliseconds: 318.408 Mib/s
32768 md5 in 1042 milliseconds: 245.681 Mib/s
8192 ripemd160 in 972 milliseconds: 65.8436 Mib/s
4096 sha1 in 324 milliseconds: 98.7654 Mib/s
16777216 ints generated by mersenne twister in 1072 milliseconds: 477.612  
Mib/s
256 ints generated by BlumBlumShub in 3611 milliseconds: 0.00216353 Mib/s
1048576 texts blowfish encrypted in 581 milliseconds: 110.155 Mib/s
65536 texts threefish encrypted in 2456 milliseconds: 6.51466 Mib/s
131072 texts AES128 encrypted in 878 milliseconds: 18.2232 Mib/s


Please hold while gdc is being recompiled....
Aug 19 2012
next sibling parent reply "Nvirjskly" <nvirjskly gmail.com> writes:
On Sunday, 19 August 2012 at 23:48:36 UTC, 1100110 wrote:
 Here are my results!  iirc -release implies -noboundscheck..
 Also I am on x64, and these files only compile to 32bit. So 
 there could be
 performance missing there.
Wow, thanks. It looks like ldc2 does not play nice with std.bigint, which is all the more reason for me to use my own version. If you want to see it run and not assert out, remove benchmark_bbs(); from main() in benchamrk.d std.bigint seems to have a lot of problems as I had to repeatedly mess around with things that SHOULD work. I think I should file a few bug reports :/ I think GDC is dying because I have scope imports scattered everywhere and it might not play nice with those... bah. So it looks like ldc2 produces somewhat faster code, if not for the fact that it did not play nice with std.bigint and that gdc does not follow the reference compiler in its support of scope imports... :/ So basically my code is dmd only atm and can be easily converted to support ldc2, and maybe gdc if scope imports are the only problem... On the topic of Whirlpool, I'm almost done a naive non-optimised version, and just need to make the S-box mixin.
Aug 19 2012
parent reply 1100110 <10equals2 gmail.com> writes:
On Sun, 19 Aug 2012 19:26:34 -0500, Nvirjskly <nvirjskly gmail.com> wrote:

 On Sunday, 19 August 2012 at 23:48:36 UTC, 1100110 wrote:
 Here are my results!  iirc -release implies -noboundscheck..
 Also I am on x64, and these files only compile to 32bit. So there could  
 be
 performance missing there.
Wow, thanks. It looks like ldc2 does not play nice with std.bigint, which is all the more reason for me to use my own version. If you want to see it run and not assert out, remove benchmark_bbs(); from main() in benchamrk.d std.bigint seems to have a lot of problems as I had to repeatedly mess around with things that SHOULD work. I think I should file a few bug reports :/ I think GDC is dying because I have scope imports scattered everywhere and it might not play nice with those... bah. So it looks like ldc2 produces somewhat faster code, if not for the fact that it did not play nice with std.bigint and that gdc does not follow the reference compiler in its support of scope imports... :/ So basically my code is dmd only atm and can be easily converted to support ldc2, and maybe gdc if scope imports are the only problem... On the topic of Whirlpool, I'm almost done a naive non-optimised version, and just need to make the S-box mixin.
Really? that was quick. I didn't get very far with my attempt. =P -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Aug 19 2012
parent reply "Nvirjskly" <nvirjskly gmail.com> writes:
 Really?  that was quick.

 I didn't get very far with my attempt.  =P
Ok, committing a broken version. Broken in the sense that it does not work correctly as of yet. *Something* is not working properly. I work on finding out what exactly some more today and tommorrow, but it is in a "usable" state, whereby usable I mean that it returns hash values, just not ones matching any test vectors... I have to figure out where my silly mistake is.
Aug 19 2012
parent "Nvirjskly" <nvirjskly gmail.com> writes:
On Monday, 20 August 2012 at 02:28:25 UTC, Nvirjskly wrote:
 Really?  that was quick.

 I didn't get very far with my attempt.  =P
Ok, committing a broken version. Broken in the sense that it does not work correctly as of yet. *Something* is not working properly. I work on finding out what exactly some more today and tommorrow, but it is in a "usable" state, whereby usable I mean that it returns hash values, just not ones matching any test vectors... I have to figure out where my silly mistake is.
Quickly replying that I already found a few mistakes, and am looking for more.
Aug 19 2012
prev sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
I can't understand what command line arguments you are using for 
LDC and GDC, but both of them have many useful optimization 
arguments (some of them are not easy to use like link time 
optimization in LDC).

Bye,
bearophile
Aug 19 2012