www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Help optimizing code?

reply Lily <yulex.42 gmail.com> writes:
I started learning D a few days ago, coming from some very basic 
C++ knowledge, and I'd like some help getting a program to run 
faster. The code is here: 
https://github.com/IndigoLily/D-mandelbrot/blob/master/mandelbrot.d

Right now it runs slower than my JavaScript Mandelbrot renderer 
on the same quality settings, which is clearly ridiculous, but I 
don't know what to do to fix it. Sorry for the lack of comments, 
but I can never tell what will and won't be obvious to other 
people.
Jan 01
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
 I started learning D a few days ago, coming from some very 
 basic C++ knowledge, and I'd like some help getting a program 
 to run faster.
So a few easy things you can do: 1) use `float` instead of `real`. real sucks, it is really slow and weird. Making that one switch doubled the speed on my computer. 2) preallocate the imageData. before the loop, `imageData.reserve(width*height*3)`. Small savings on my computer but an easy one. 3) make sure you use the compiler optimization options like `-O` and `-inline` on dmd (or use the gdc and ldc compilers both of which generally optimize better than dmd out of the box). And if that isn't enough we can look into smaller things, but these overall brought the time down to about 1/3 what it started on my box.
Jan 01
parent user1234 <user1234 12.nl> writes:
On Monday, 1 January 2018 at 15:23:19 UTC, Adam D. Ruppe wrote:
 On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
 I started learning D a few days ago, coming from some very 
 basic C++ knowledge, and I'd like some help getting a program 
 to run faster.
So a few easy things you can do: 1) use `float` instead of `real`. real sucks, it is really slow and weird. Making that one switch doubled the speed on my computer.
Yes I've also adviced double. Double is better if the target arch is X86_64 since part of the operations will be made with SSE. With "real" the OP was **sure** to get 100% of the maths done in the FPU (although for all the trigo stuff there's no choice)
 2) preallocate the imageData. before the loop, 
 `imageData.reserve(width*height*3)`. Small savings on my 
 computer but an easy one.

 3) make sure you use the compiler optimization options like 
 `-O` and `-inline` on dmd (or use the gdc and ldc compilers 
 both of which generally optimize better than dmd out of the 
 box).


 And if that isn't enough we can look into smaller things, but 
 these overall brought the time down to about 1/3 what it 
 started on my box.
Jan 01
prev sibling next sibling parent reply user1234 <user1234 12.nl> writes:
On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
 I started learning D a few days ago, coming from some very 
 basic C++ knowledge, and I'd like some help getting a program 
 to run faster. The code is here: 
 https://github.com/IndigoLily/D-mandelbrot/blob/master/mandelbrot.d

 Right now it runs slower than my JavaScript Mandelbrot renderer 
 on the same quality settings, which is clearly ridiculous, but 
 I don't know what to do to fix it. Sorry for the lack of 
 comments, but I can never tell what will and won't be obvious 
 to other people.
- The first thing is to compile with the best options: dmd mandelbrot.d -O -release -inline -boundscheck=off - You append a lot, which can cause reallocs for imageData; Try import std.array; Appender!(ubyte[]) imageData; The code will not have to be changed for "~=" since Appender overloads this operator. - I'd use "double" instead of "real".
Jan 01
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 1 January 2018 at 15:29:28 UTC, user1234 wrote:
     dmd mandelbrot.d -O -release -inline -boundscheck=off
-O and -inline are OK, but -release and -boundscheck are harmful and shouldn't be used. Yeah, you can squeeze a bit of speed out of them, but there's another way to do it - `.ptr` on the individual accesses or versioning out unwanted `assert` statements - and those avoid major bug and security baggage that -release and -boundscheck=off bring. In this program, I didn't see a major improvement with the boundscheck skipping... and in this program, it seems to be written without the bugs, but still, I am against that switch on principle. It is so so so easy to break things with them.
 - I'd use "double" instead of "real".
On my computer at least, float gave 2x speed compared to double. You could try both though and see which works better.
Jan 01
parent reply Muld <2 2.2> writes:
On Monday, 1 January 2018 at 15:54:33 UTC, Adam D. Ruppe wrote:
 On Monday, 1 January 2018 at 15:29:28 UTC, user1234 wrote:
     dmd mandelbrot.d -O -release -inline -boundscheck=off
-O and -inline are OK, but -release and -boundscheck are harmful and shouldn't be used. Yeah, you can squeeze a bit of speed out of them, but there's another way to do it - `.ptr` on the individual accesses or versioning out unwanted `assert` statements - and those avoid major bug and security baggage that -release and -boundscheck=off bring.
If you use .ptr then you get zero detection, even in debug builds.
 In this program, I didn't see a major improvement with the 
 boundscheck skipping... and in this program, it seems to be 
 written without the bugs, but still, I am against that switch 
 on principle. It is so so so easy to break things with them.
In this program, it's relatively small and doesn't look like it does its calculations in realtime. I'd rather there be a potential bug than the program running to slow to be usable, or have zero debugging for indices in debug builds.
Jan 01
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 1 January 2018 at 16:13:37 UTC, Muld wrote:
 If you use .ptr then you get zero detection, even in debug 
 builds.
It is limited to the one expression where you wrote it, instead of on the ENTIRE program like the build switches do. It is a lot easier to check correctness in an individual expression than it is to check the entire program, including stuff you didn't even realize might have been a problem. With the .ptr pattern, it is correct by default and you individually change ones you (should) look carefully at. With -boundscheck, it is wrong by default and most people don't even look at it - people suggest it to newbies as an optimization without mentioning how nasty it is.
 I'd rather there be a potential bug than the program running to 
 slow to be usable
That's a ridiculous exaggeration. In this program, I saw a < 1% time difference using those flags. -O -inline make a 50x bigger difference!
 or have zero debugging for indices in debug builds.
You shouldn't be using .ptr until after you've carefully checked and debugged the line of code where you are writing it. That's the beauty of the pattern: it only affects one line of code, so you can test it before you use it without affecting the rest of the program.
Jan 01
parent Muld <2 2.2> writes:
On Monday, 1 January 2018 at 16:47:40 UTC, Adam D. Ruppe wrote:
 On Monday, 1 January 2018 at 16:13:37 UTC, Muld wrote:
 If you use .ptr then you get zero detection, even in debug 
 builds.
It is limited to the one expression where you wrote it, instead of on the ENTIRE program like the build switches do. It is a lot easier to check correctness in an individual expression than it is to check the entire program, including stuff you didn't even realize might have been a problem. With the .ptr pattern, it is correct by default and you individually change ones you (should) look carefully at. With -boundscheck, it is wrong by default and most people don't even look at it - people suggest it to newbies as an optimization without mentioning how nasty it is.
It won't be just one line though. When you pretty much have to use it EVERYWHERE to get the optimization you want. It makes more sense to just turn off the check for the entire program and use your own asserts() where they are actually needed. That way you still get the checks in debug builds and have asserts where they are actually necessary.
 I'd rather there be a potential bug than the program running 
 to slow to be usable
That's a ridiculous exaggeration. In this program, I saw a < 1% time difference using those flags. -O -inline make a 50x bigger difference!
Read the sentence right before this.. Jesus. People only read what they want.
 or have zero debugging for indices in debug builds.
You shouldn't be using .ptr until after you've carefully checked and debugged the line of code where you are writing it. That's the beauty of the pattern: it only affects one line of code, so you can test it before you use it without affecting the rest of the program.
It won't just be one line, and that's not beautiful. What happens when code gets refactored? You are constantly going to be flip-flopping the source code rather than a compiler flag or using multiple build configurations? How long are you even going to test for? The error that might happen for the code is probably difficult to detect, if it wasn't then having bounds checking at all wouldn't be necessary. Just test your code, that's the beauty of testing!
Jan 01
prev sibling parent reply Uknown <sireeshkodali1 gmail.com> writes:
On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
 I started learning D a few days ago, coming from some very 
 basic C++ knowledge, and I'd like some help getting a program 
 to run faster. The code is here: 
 https://github.com/IndigoLily/D-mandelbrot/blob/master/mandelbrot.d

 Right now it runs slower than my JavaScript Mandelbrot renderer 
 on the same quality settings, which is clearly ridiculous, but 
 I don't know what to do to fix it. Sorry for the lack of 
 comments, but I can never tell what will and won't be obvious 
 to other people.
Hey! I happened to also write a Mandelbrot generator in D. It was based of the version given on rossetacode for C[0]. Some of the optimizations I used were: 0. Use LDC. It is significantly faster. 1. Utilize the fact that the Mandelbrot set is symmetric about the X axis.You can half the time taken. 2. Use std.parallelism for using multiple cores on the CPU 3. Use fastmath of LDC 4. imageData.reserve(width * height * 3) before the loop 5. [1] is a great article on this specific topic For reference, on my 28W 2 core i5, a 2560x1600 image took about 2 minutes to render, with 500,000 iterations per pixel. [2] is my own version. [0]: https://rosettacode.org/wiki/Mandelbrot_set#PPM_non_interactive [1]: https://randomascii.wordpress.com/2011/08/13/faster-fractals-through-algebra/ [2]: https://github.com/Sirsireesh/Khoj-2017/blob/master/Mandelbrot-set/mandlebrot.d
Jan 01
parent Uknown <sireeshkodali1 gmail.com> writes:
On Tuesday, 2 January 2018 at 07:17:23 UTC, Uknown wrote:
 [snip]
 0. Use LDC. It is significantly faster.
 1. Utilize the fact that the Mandelbrot  set is symmetric about 
 the X axis.You can half the time taken.
 2. Use std.parallelism for using multiple cores on the CPU
 3. Use  fastmath of LDC
 4. imageData.reserve(width * height * 3) before the loop
 5. [1] is a great article on this specific topic
 [snip]
Forgot to mention that since you already know some of the edges, you can avoid unnecessarily looping through some regions. That saves a lot of time
Jan 01