digitalmars.D - Code runtime

Ausprobierer (71/71) Jun 05 2016 I've written this simple piece of code:

ag0aep6g (3/4) Jun 05 2016 Misses -O for actual optimizations. You can also try ldc or gdc for

Ausprobierer (2/6) Jun 05 2016 Still a runtime of approx. 36 seconds.

ag0aep6g (6/13) Jun 05 2016 When I compare `ldc2 -release -O` against `g++ -O`, I get 13 seconds and...

Donald Drumpf (6/18) Jun 05 2016 How many times did you run each? A single run isn't ever really

Ausprobierer (22/42) Jun 05 2016 With

Donald Drumpf (2/25) Jun 05 2016 Well that's embarrassing.

extrawurst (9/80) Jun 05 2016 I never used those methods neither in D nor in C++ but I have a

Ausprobierer (14/17) Jun 05 2016 I think you're right, it probably is the nextPermutation()
H. S. Teoh via Digitalmars-d (7/10) Jun 05 2016 I wrote nextPermutation. It does not allocate.

Andrei Alexandrescu (14/21) Jun 05 2016 FWIW:

Vladimir Panteleev (3/4) Jun 06 2016 FWIW, there's also this:

Seb (6/11) Jun 06 2016 and mir.combinatorics

John Colvin (8/79) Jun 05 2016 On OS X using ldc and clang, I get C++: ~6s and D: ~7s. The

Ausprobierer (3/10) Jun 05 2016 I've compiled and run the code on Win7 x64. I think it's a
Johan Engelen (6/10) Jun 06 2016 +1 !

Max Samukha (2/3) Jun 06 2016 Note that 'long' in D is 64 bits, whereas in MSVC it is 32 bits.

Ausprobierer <c3550227 trbvn.com> writes:

I've written this simple piece of code:

[CODE]
import std.algorithm;
import std.array;
import std.conv;
import std.datetime;
import std.parallelism;
import std.range;
import std.stdio;
import core.atomic;
import core.thread;

void main()
{
     immutable long n = 11;

     long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0];
     long j = 0;
     long l = z.length;

     StopWatch w; w.reset(); w.start;

     do
     {
         if (z[0] == 0) break; else
         {

             long m = 0;
             for (long i = 0; i < z.length; i += 2)
             {
                 m += (z[i]*10 + z[i+1]);
             }
             if (m % n == 0) j++;
         }
     }
     while (nextPermutation(z));

     w.stop();
     j.writeln;
     w.peek().msecs.writeln;
}
[/CODE]

I've compiled it with "dmd -m64 -release -inline <filename>.d".
Output is 58679513, and it takes around 52 seconds to run.

Then I've replicated this code in MSVS2015 C++:

[CODE]
#include <algorithm>
#include <iostream>
#include <ctime>

using namespace std;

const long N = 11;

long Z[] = { 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 0, 0 };
long L = sizeof(Z) / sizeof(Z[0]);

int main(void)
{
	long j = 0;

	clock_t t = clock();

	do {
		if (Z[0] == 0) break;

		long m = 0;
		for (long i = 0; i < L; i += 2) {
			m += (Z[i] * 10 + Z[i + 1]);
		}

		if (m % N == 0) {
			j++;
		}
	} while (next_permutation(Z, Z+L));

	t = clock() - t;

	cout << j << endl;
	cout << (float(t)) / CLOCKS_PER_SEC << endl;

	return 0;
}
[/CODE]

Compile mode is also Release/x64. I've run the program and was 
eagerly surprised, not to say stunned. The output is also 
58679513, but it's done in under 6 seconds!

Care to explain? I haven't found a solution to this :(

Jun 05 2016

ag0aep6g <anonymous example.com> writes:

On 06/05/2016 11:52 PM, Ausprobierer wrote:
 I've compiled it with "dmd -m64 -release -inline <filename>.d".

Misses -O for actual optimizations. You can also try ldc or gdc for 
better performance.

Jun 05 2016

Ausprobierer <c3550227 trbvn.com> writes:

On Sunday, 5 June 2016 at 21:56:30 UTC, ag0aep6g wrote:
 On 06/05/2016 11:52 PM, Ausprobierer wrote:
 I've compiled it with "dmd -m64 -release -inline <filename>.d".

 Misses -O for actual optimizations. You can also try ldc or gdc 
 for better performance.

Still a runtime of approx. 36 seconds.

Jun 05 2016

ag0aep6g <anonymous example.com> writes:

On 06/06/2016 12:00 AM, Ausprobierer wrote:
 On Sunday, 5 June 2016 at 21:56:30 UTC, ag0aep6g wrote:
 On 06/05/2016 11:52 PM, Ausprobierer wrote:
 I've compiled it with "dmd -m64 -release -inline <filename>.d".

 Misses -O for actual optimizations. You can also try ldc or gdc for
 better performance.

 Still a runtime of approx. 36 seconds.

When I compare `ldc2 -release -O` against `g++ -O`, I get 13 seconds and 
9 seconds. Still slower, but it's in the same ballpark.

`dmd -release -O -inline` does significantly worse: 40 seconds. dmd's 
strength are quick compilations, not fast executables (not that the two 
are mutually exclusive).

Jun 05 2016

Donald Drumpf <donald.trump dt.com> writes:

On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:
 I've written this simple piece of code:

 [CODE]
 import std.algorithm;
 import std.array;
 import std.conv;
 import std.datetime;
 import std.parallelism;
 import std.range;
 import std.stdio;
 import core.atomic;
 import core.thread;

 [...]

How many times did you run each? A single run isn't ever really 
useful. Also, not sure if this would account for the drastic 
difference, but declare long Z[] in global scope over in local 
scope might have some impact.

If I wrote this, it would be faster, I guarantee it.

Jun 05 2016

Ausprobierer <c3550227 trbvn.com> writes:

On Sunday, 5 June 2016 at 21:58:46 UTC, Donald Drumpf wrote:
 On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:
 I've written this simple piece of code:

 [CODE]
 import std.algorithm;
 import std.array;
 import std.conv;
 import std.datetime;
 import std.parallelism;
 import std.range;
 import std.stdio;
 import core.atomic;
 import core.thread;

 [...]

 How many times did you run each? A single run isn't ever really 
 useful. Also, not sure if this would account for the drastic 
 difference, but declare long Z[] in global scope over in local 
 scope might have some impact.

 If I wrote this, it would be faster, I guarantee it.

With

[CODE]
import std.algorithm;
import std.array;
import std.conv;
import std.datetime;
import std.parallelism;
import std.range;
import std.stdio;
import core.atomic;
import core.thread;

long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0];

void main()
{
     immutable long n = 11;

     long l = z.length;
     long j = 0;
[...]
[/CODE]

and "dmd -m64 -O -inline -release <filename>.d" I still get 40 
seconds runtime.

Jun 05 2016

Donald Drumpf <donald.trump dt.com> writes:

On Sunday, 5 June 2016 at 22:08:21 UTC, Ausprobierer wrote:
 On Sunday, 5 June 2016 at 21:58:46 UTC, Donald Drumpf wrote:
 [...]

 With

 [CODE]
 import std.algorithm;
 import std.array;
 import std.conv;
 import std.datetime;
 import std.parallelism;
 import std.range;
 import std.stdio;
 import core.atomic;
 import core.thread;

 long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0];

 void main()
 {
     immutable long n = 11;

     long l = z.length;
     long j = 0;
 [...]
 [/CODE]

 and "dmd -m64 -O -inline -release <filename>.d" I still get 40 
 seconds runtime.

Well that's embarrassing.

Jun 05 2016

extrawurst <stephan extrawurst.org> writes:

On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:
 I've written this simple piece of code:

 [CODE]
 import std.algorithm;
 import std.array;
 import std.conv;
 import std.datetime;
 import std.parallelism;
 import std.range;
 import std.stdio;
 import core.atomic;
 import core.thread;

 void main()
 {
     immutable long n = 11;

     long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0];
     long j = 0;
     long l = z.length;

     StopWatch w; w.reset(); w.start;

     do
     {
         if (z[0] == 0) break; else
         {

             long m = 0;
             for (long i = 0; i < z.length; i += 2)
             {
                 m += (z[i]*10 + z[i+1]);
             }
             if (m % n == 0) j++;
         }
     }
     while (nextPermutation(z));

     w.stop();
     j.writeln;
     w.peek().msecs.writeln;
 }
 [/CODE]

 I've compiled it with "dmd -m64 -release -inline <filename>.d".
 Output is 58679513, and it takes around 52 seconds to run.

 Then I've replicated this code in MSVS2015 C++:

 [CODE]
 #include <algorithm>
 #include <iostream>
 #include <ctime>

 using namespace std;

 const long N = 11;

 long Z[] = { 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 0, 0 };
 long L = sizeof(Z) / sizeof(Z[0]);

 int main(void)
 {
 	long j = 0;

 	clock_t t = clock();

 	do {
 		if (Z[0] == 0) break;

 		long m = 0;
 		for (long i = 0; i < L; i += 2) {
 			m += (Z[i] * 10 + Z[i + 1]);
 		}

 		if (m % N == 0) {
 			j++;
 		}
 	} while (next_permutation(Z, Z+L));

 	t = clock() - t;

 	cout << j << endl;
 	cout << (float(t)) / CLOCKS_PER_SEC << endl;

 	return 0;
 }
 [/CODE]

 Compile mode is also Release/x64. I've run the program and was 
 eagerly surprised, not to say stunned. The output is also 
 58679513, but it's done in under 6 seconds!

 Care to explain? I haven't found a solution to this :(

I never used those methods neither in D nor in C++ but I have a 
gut feeling it is the nextPermutation method in phobos, 
allocating internally (it is not  nogc at least) but you can 
compare it easily to c++: 
https://github.com/dlang/phobos/blob/cdc8218e9840353f6ed79f00a63280c3002b198f/std/algorithm/sorting.d#L2786

could be less noise if you benchmark just calling this method vs. 
in C++ and even dig into the disassembly difference there.

--stephan

Jun 05 2016

Ausprobierer <c3550227 trbvn.com> writes:

 I never used those methods neither in D nor in C++ but I have a 
 gut feeling it is the nextPermutation method in phobos, [...]

 --stephan

I think you're right, it probably is the nextPermutation() 
function:

<imports>
long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0];

void main()
{
     do
     {
         if (z[0] == 0) break;
     }
     while (nextPermutation(z));
}

...takes approx. 31 seconds, compiled with "-m64 -O -release 
-inline".

Jun 05 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Sun, Jun 05, 2016 at 10:29:09PM +0000, extrawurst via Digitalmars-d wrote:
[...]
 I never used those methods neither in D nor in C++ but I have a gut
 feeling it is the nextPermutation method in phobos, allocating
 internally

I wrote nextPermutation. It does not allocate.

For performance, I recommend using gdc or ldc. It is not dmd's strength.


T

-- 
If blunt statements had a point, they wouldn't be blunt...

Jun 05 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/6/16 1:12 AM, H. S. Teoh via Digitalmars-d wrote:
 On Sun, Jun 05, 2016 at 10:29:09PM +0000, extrawurst via Digitalmars-d wrote:
 [...]
 I never used those methods neither in D nor in C++ but I have a gut
 feeling it is the nextPermutation method in phobos, allocating
 internally

 I wrote nextPermutation. It does not allocate.

 For performance, I recommend using gdc or ldc. It is not dmd's strength.

FWIW:

* gdc should be used when benchmarking to ensure comparable backends.

* Just looked at std.permutation 
(https://github.com/dlang/phobos/blob/cdc8218e9840353f6ed79f00a63280c3002b198f/std/algorit
m/sorting.d#L2786), 
looks good but operates on complex compound ranges so it puts more 
pressure on the optimizer. Compare to C++'s std::next_permutation 
(http://stackoverflow.com/questions/11483060/stdnext-permutation-impleme
tation-explanation) 
which directly bumps iterators etc. We also have trouble because of 
ranges' much discussed difficulties with iterators pointing somewhere 
inside them (e.g. we need an extra counter n).

A specialization of nextPermutation for random-access ranges should pay 
off here.


Andrei

Jun 05 2016

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Monday, 6 June 2016 at 06:25:19 UTC, Andrei Alexandrescu wrote:
 * Just looked at std.permutation

FWIW, there's also this:

https://dlang.org/library/std/algorithm/iteration/permutations.html

Jun 06 2016

Seb <seb wilzba.ch> writes:

On Monday, 6 June 2016 at 10:44:10 UTC, Vladimir Panteleev wrote:
 On Monday, 6 June 2016 at 06:25:19 UTC, Andrei Alexandrescu 
 wrote:
 * Just looked at std.permutation

 FWIW, there's also this:

 https://dlang.org/library/std/algorithm/iteration/permutations.html


and mir.combinatorics

http://docs.mir.dlang.io/latest/mir_combinatorics.html

if you are searching for more algorithms with range support. For 
all of these you can use the new Allocator API to allocate the 
initial state.

Jun 06 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:
 I've written this simple piece of code:

 [CODE]
 import std.algorithm;
 import std.array;
 import std.conv;
 import std.datetime;
 import std.parallelism;
 import std.range;
 import std.stdio;
 import core.atomic;
 import core.thread;

 void main()
 {
     immutable long n = 11;

     long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0];
     long j = 0;
     long l = z.length;

     StopWatch w; w.reset(); w.start;

     do
     {
         if (z[0] == 0) break; else
         {

             long m = 0;
             for (long i = 0; i < z.length; i += 2)
             {
                 m += (z[i]*10 + z[i+1]);
             }
             if (m % n == 0) j++;
         }
     }
     while (nextPermutation(z));

     w.stop();
     j.writeln;
     w.peek().msecs.writeln;
 }
 [/CODE]

 I've compiled it with "dmd -m64 -release -inline <filename>.d".
 Output is 58679513, and it takes around 52 seconds to run.

 Then I've replicated this code in MSVS2015 C++:

 [CODE]
 #include <algorithm>
 #include <iostream>
 #include <ctime>

 using namespace std;

 const long N = 11;

 long Z[] = { 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 0, 0 };
 long L = sizeof(Z) / sizeof(Z[0]);

 int main(void)
 {
 	long j = 0;

 	clock_t t = clock();

 	do {
 		if (Z[0] == 0) break;

 		long m = 0;
 		for (long i = 0; i < L; i += 2) {
 			m += (Z[i] * 10 + Z[i + 1]);
 		}

 		if (m % N == 0) {
 			j++;
 		}
 	} while (next_permutation(Z, Z+L));

 	t = clock() - t;

 	cout << j << endl;
 	cout << (float(t)) / CLOCKS_PER_SEC << endl;

 	return 0;
 }
 [/CODE]

 Compile mode is also Release/x64. I've run the program and was 
 eagerly surprised, not to say stunned. The output is also 
 58679513, but it's done in under 6 seconds!

 Care to explain? I haven't found a solution to this :(

On OS X using ldc and clang, I get C++: ~6s and D: ~7s. The 
slowdown in D seems to be due to parts of nextPermutation not 
ending up inlined.

Be careful with benchmarks like this, you are giving the compiler 
a lot more information than it usually has in any real world case 
(here it knows the exact values of all the input 
data/parameters!).

Jun 05 2016

Ausprobierer <c3553265 trbvn.com> writes:

 On OS X using ldc and clang, I get C++: ~6s and D: ~7s. The 
 slowdown in D seems to be due to parts of nextPermutation not 
 ending up inlined.

 Be careful with benchmarks like this, you are giving the 
 compiler a lot more information than it usually has in any real 
 world case (here it knows the exact values of all the input 
 data/parameters!).

I've compiled and run the code on Win7 x64. I think it's a 
compiler issue then, with DMD causing the same slow behaviour on 
Linux x64. If so, I should possibly resort to e. g. LDC.

Jun 05 2016

Johan Engelen <j j.nl> writes:

On Sunday, 5 June 2016 at 22:34:47 UTC, John Colvin wrote:
 Be careful with benchmarks like this, you are giving the 
 compiler a lot more information than it usually has in any real 
 world case (here it knows the exact values of all the input 
 data/parameters!).

+1 !

For the input data, I think you could have a function return the 
data and mark that function with weak linkage so the compiler 
cannot reason about it nor inline it.

-Johan

Jun 06 2016

Max Samukha <maxsamukha gmail.com> writes:

On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:

 Then I've replicated this code in MSVS2015 C++:

Note that 'long' in D is 64 bits, whereas in MSVC it is 32 bits.

Jun 06 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Code runtime