digitalmars.D.learn - Help optimizing code?

Lily (9/9) Jan 01 2018 I started learning D a few days ago, coming from some very basic

Adam D. Ruppe (14/17) Jan 01 2018 So a few easy things you can do:

user1234 (5/23) Jan 01 2018 Yes I've also adviced double. Double is better if the target arch

user1234 (9/18) Jan 01 2018 - The first thing is to compile with the best options:

Adam D. Ruppe (13/15) Jan 01 2018 -O and -inline are OK, but -release and -boundscheck are harmful

Muld (6/18) Jan 01 2018 In this program, it's relatively small and doesn't look like it

Adam D. Ruppe (19/24) Jan 01 2018 It is limited to the one expression where you wrote it, instead

Muld (17/41) Jan 01 2018 It won't be just one line though. When you pretty much have to

Uknown (21/30) Jan 01 2018 Hey! I happened to also write a Mandelbrot generator in D. It was

Uknown (4/13) Jan 01 2018 Forgot to mention that since you already know some of the edges,

Lily <yulex.42 gmail.com> writes:

I started learning D a few days ago, coming from some very basic 
C++ knowledge, and I'd like some help getting a program to run 
faster. The code is here: 
https://github.com/IndigoLily/D-mandelbrot/blob/master/mandelbrot.d

Right now it runs slower than my JavaScript Mandelbrot renderer 
on the same quality settings, which is clearly ridiculous, but I 
don't know what to do to fix it. Sorry for the lack of comments, 
but I can never tell what will and won't be obvious to other 
people.

Jan 01 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
 I started learning D a few days ago, coming from some very 
 basic C++ knowledge, and I'd like some help getting a program 
 to run faster.

So a few easy things you can do:

1) use `float` instead of `real`. real sucks, it is really slow 
and weird. Making that one switch doubled the speed on my 
computer.

2) preallocate the imageData. before the loop, 
`imageData.reserve(width*height*3)`. Small savings on my computer 
but an easy one.

3) make sure you use the compiler optimization options like `-O` 
and `-inline` on dmd (or use the gdc and ldc compilers both of 
which generally optimize better than dmd out of the box).


And if that isn't enough we can look into smaller things, but 
these overall brought the time down to about 1/3 what it started 
on my box.

Jan 01 2018

user1234 <user1234 12.nl> writes:

On Monday, 1 January 2018 at 15:23:19 UTC, Adam D. Ruppe wrote:
 On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
 I started learning D a few days ago, coming from some very 
 basic C++ knowledge, and I'd like some help getting a program 
 to run faster.

 So a few easy things you can do:

 1) use `float` instead of `real`. real sucks, it is really slow 
 and weird. Making that one switch doubled the speed on my 
 computer.

Yes I've also adviced double. Double is better if the target arch 
is X86_64 since part of the operations will be made with SSE. 
With "real" the OP was **sure** to get 100% of the maths done in 
the FPU (although for all the trigo stuff there's no choice)

 2) preallocate the imageData. before the loop, 
 `imageData.reserve(width*height*3)`. Small savings on my 
 computer but an easy one.

 3) make sure you use the compiler optimization options like 
 `-O` and `-inline` on dmd (or use the gdc and ldc compilers 
 both of which generally optimize better than dmd out of the 
 box).


 And if that isn't enough we can look into smaller things, but 
 these overall brought the time down to about 1/3 what it 
 started on my box.

Jan 01 2018

user1234 <user1234 12.nl> writes:

On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
 I started learning D a few days ago, coming from some very 
 basic C++ knowledge, and I'd like some help getting a program 
 to run faster. The code is here: 
 https://github.com/IndigoLily/D-mandelbrot/blob/master/mandelbrot.d

 Right now it runs slower than my JavaScript Mandelbrot renderer 
 on the same quality settings, which is clearly ridiculous, but 
 I don't know what to do to fix it. Sorry for the lack of 
 comments, but I can never tell what will and won't be obvious 
 to other people.

- The first thing is to compile with the best options:

     dmd mandelbrot.d -O -release -inline -boundscheck=off

- You append a lot, which can cause reallocs for imageData; Try

    import std.array;
    Appender!(ubyte[]) imageData;

    The code will not have to be changed for "~=" since Appender 
overloads this operator.

- I'd use "double" instead of "real".

Jan 01 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 1 January 2018 at 15:29:28 UTC, user1234 wrote:
     dmd mandelbrot.d -O -release -inline -boundscheck=off

-O and -inline are OK, but -release and -boundscheck are harmful 
and shouldn't be used. Yeah, you can squeeze a bit of speed out 
of them, but there's another way to do it - `.ptr` on the 
individual accesses or versioning out unwanted `assert` 
statements - and those avoid major bug and security baggage that 
-release and -boundscheck=off bring.

In this program, I didn't see a major improvement with the 
boundscheck skipping... and in this program, it seems to be 
written without the bugs, but still, I am against that switch on 
principle. It is so so so easy to break things with them.

 - I'd use "double" instead of "real".

On my computer at least, float gave 2x speed compared to double. 
You could try both though and see which works better.

Jan 01 2018

Muld <2 2.2> writes:

On Monday, 1 January 2018 at 15:54:33 UTC, Adam D. Ruppe wrote:
 On Monday, 1 January 2018 at 15:29:28 UTC, user1234 wrote:
     dmd mandelbrot.d -O -release -inline -boundscheck=off

 -O and -inline are OK, but -release and -boundscheck are 
 harmful and shouldn't be used. Yeah, you can squeeze a bit of 
 speed out of them, but there's another way to do it - `.ptr` on 
 the individual accesses or versioning out unwanted `assert` 
 statements - and those avoid major bug and security baggage 
 that -release and -boundscheck=off bring.

If you use .ptr then you get zero detection, even in debug builds.

 In this program, I didn't see a major improvement with the 
 boundscheck skipping... and in this program, it seems to be 
 written without the bugs, but still, I am against that switch 
 on principle. It is so so so easy to break things with them.

In this program, it's relatively small and doesn't look like it 
does its calculations in realtime. I'd rather there be a 
potential bug than the program running to slow to be usable, or 
have zero debugging for indices in debug builds.

Jan 01 2018

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 1 January 2018 at 16:13:37 UTC, Muld wrote:
 If you use .ptr then you get zero detection, even in debug 
 builds.

It is limited to the one expression where you wrote it, instead 
of on the ENTIRE program like the build switches do.

It is a lot easier to check correctness in an individual 
expression than it is to check the entire program, including 
stuff you didn't even realize might have been a problem.

With the .ptr pattern, it is correct by default and you 
individually change ones you (should) look carefully at. With 
-boundscheck, it is wrong by default and most people don't even 
look at it - people suggest it to newbies as an optimization 
without mentioning how nasty it is.

 I'd rather there be a potential bug than the program running to 
 slow to be usable

That's a ridiculous exaggeration. In this program, I saw a < 1% 
time difference using those flags. -O -inline make a 50x bigger 
difference!

 or have zero debugging for indices in debug builds.

You shouldn't be using .ptr until after you've carefully checked 
and debugged the line of code where you are writing it. That's 
the beauty of the pattern: it only affects one line of code, so 
you can test it before you use it without affecting the rest of 
the program.

Jan 01 2018

Muld <2 2.2> writes:

On Monday, 1 January 2018 at 16:47:40 UTC, Adam D. Ruppe wrote:
 On Monday, 1 January 2018 at 16:13:37 UTC, Muld wrote:
 If you use .ptr then you get zero detection, even in debug 
 builds.

 It is limited to the one expression where you wrote it, instead 
 of on the ENTIRE program like the build switches do.

 It is a lot easier to check correctness in an individual 
 expression than it is to check the entire program, including 
 stuff you didn't even realize might have been a problem.

 With the .ptr pattern, it is correct by default and you 
 individually change ones you (should) look carefully at. With 
 -boundscheck, it is wrong by default and most people don't even 
 look at it - people suggest it to newbies as an optimization 
 without mentioning how nasty it is.

It won't be just one line though. When you pretty much have to 
use it EVERYWHERE to get the optimization you want. It makes more 
sense to just turn off the check for the entire program and use 
your own asserts() where they are actually needed. That way you 
still get the checks in debug builds and have asserts where they 
are actually necessary.

 I'd rather there be a potential bug than the program running 
 to slow to be usable

 That's a ridiculous exaggeration. In this program, I saw a < 1% 
 time difference using those flags. -O -inline make a 50x bigger 
 difference!

Read the sentence right before this.. Jesus. People only read 
what they want.

 or have zero debugging for indices in debug builds.

 You shouldn't be using .ptr until after you've carefully 
 checked and debugged the line of code where you are writing it. 
 That's the beauty of the pattern: it only affects one line of 
 code, so you can test it before you use it without affecting 
 the rest of the program.

It won't just be one line, and that's not beautiful. What happens 
when code gets refactored? You are constantly going to be 
flip-flopping the source code rather than a compiler flag or 
using multiple build configurations? How long are you even going 
to test for? The error that might happen for the code is probably 
difficult to detect, if it wasn't then having bounds checking at 
all wouldn't be necessary. Just test your code, that's the beauty 
of testing!

Jan 01 2018

Uknown <sireeshkodali1 gmail.com> writes:

On Monday, 1 January 2018 at 15:09:53 UTC, Lily wrote:
 I started learning D a few days ago, coming from some very 
 basic C++ knowledge, and I'd like some help getting a program 
 to run faster. The code is here: 
 https://github.com/IndigoLily/D-mandelbrot/blob/master/mandelbrot.d

 Right now it runs slower than my JavaScript Mandelbrot renderer 
 on the same quality settings, which is clearly ridiculous, but 
 I don't know what to do to fix it. Sorry for the lack of 
 comments, but I can never tell what will and won't be obvious 
 to other people.

Hey! I happened to also write a Mandelbrot generator in D. It was 
based of the version given on rossetacode for C[0].
Some of the optimizations I used were:

0. Use LDC. It is significantly faster.
1. Utilize the fact that the Mandelbrot  set is symmetric about 
the X axis.You can half the time taken.
2. Use std.parallelism for using multiple cores on the CPU
3. Use  fastmath of LDC
4. imageData.reserve(width * height * 3) before the loop
5. [1] is a great article on this specific topic

For reference, on my 28W 2 core i5, a 2560x1600 image took about 
2 minutes to
render, with 500,000 iterations per pixel.
[2] is my own version.

[0]: 
https://rosettacode.org/wiki/Mandelbrot_set#PPM_non_interactive
[1]: 
https://randomascii.wordpress.com/2011/08/13/faster-fractals-through-algebra/
[2]: 
https://github.com/Sirsireesh/Khoj-2017/blob/master/Mandelbrot-set/mandlebrot.d

Jan 01 2018

Uknown <sireeshkodali1 gmail.com> writes:

On Tuesday, 2 January 2018 at 07:17:23 UTC, Uknown wrote:
 [snip]
 0. Use LDC. It is significantly faster.
 1. Utilize the fact that the Mandelbrot  set is symmetric about 
 the X axis.You can half the time taken.
 2. Use std.parallelism for using multiple cores on the CPU
 3. Use  fastmath of LDC
 4. imageData.reserve(width * height * 3) before the loop
 5. [1] is a great article on this specific topic
 [snip]

Forgot to mention that since you already know some of the edges, 
you can avoid unnecessarily looping through some regions. That 
saves a lot of time

Jan 01 2018

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Help optimizing code?