digitalmars.D.learn - Need for speed

Nestor (49/49) Apr 01 2021 I am a python programmer and I am enjoying Dlang and learning

ag0aep6g (7/23) Apr 01 2021 In the D version, you're re-seeding the random number generator

=?UTF-8?Q?Ali_=c3=87ehreli?= (9/10) Apr 01 2021 Doing that reduces the time about 15 fold.

Imperatorn (2/5) Apr 01 2021 Could you also post the python code for comparison?

Imperatorn (2/8) Apr 01 2021 Omg I totally missed it lol 😂

Imperatorn (21/70) Apr 01 2021 Now when I actually read what you wrote I tested moving auto rnd
Chris Piker (11/15) Apr 01 2021 Nice test. I'm new to D as well and can't comment on needed

Imperatorn (2/17) Apr 01 2021 If I make more changes in D I get below 10ms
Imperatorn (8/23) Apr 01 2021 I have made 4 variants of the code and get:

Berni44 (7/11) Apr 01 2021 Try using ldc2 instead of dmd:

Imperatorn (23/34) Apr 01 2021 It did! Tried those flags with dmd and ldc and got the following
matheus (10/16) Apr 01 2021 Since this is a "Learn" part of the Foruam, be careful with

H. S. Teoh (33/43) Apr 01 2021 [...]

=?UTF-8?Q?Ali_=c3=87ehreli?= (25/28) Apr 01 2021 And that may not be a contributing factor. :) The following program sees...

H. S. Teoh (10/23) Apr 01 2021 Right, but in a typical program it's unpredictable whether there will be

Imperatorn (2/15) Apr 01 2021 Sorting takes longer proportionally though

H. S. Teoh (9/14) Apr 01 2021 [...]

Jon Degenhardt (9/31) Apr 01 2021 This is my experience as well, and not just for D. Pick good

H. S. Teoh (32/51) Apr 01 2021 I've been saying this for years, the GC is (usually) not evil. It's

=?UTF-8?Q?Ali_=c3=87ehreli?= (10/13) Apr 02 2021 For those who prefer a video description with some accent :) here is how...

drug (6/8) Apr 02 2021 What about accent - I'm curious what would you say about this old

ag0aep6g (4/8) Apr 01 2021 Please don't recommend `-boundscheck=off` to newbies. It's not just an

Steven Schveighoffer (5/14) Apr 01 2021 Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on

ag0aep6g (3/11) Apr 01 2021 `-O -release` already does that, doesn't it?

Steven Schveighoffer (5/16) Apr 01 2021 Maybe, but I wasn't responding to that, just your statement not to

ag0aep6g (2/4) Apr 01 2021 Right.

H. S. Teoh (78/101) Apr 01 2021 Since the length of the array is already known beforehand, you could get

Nestor (11/17) Apr 02 2021 First, thanks everyone!

Nestor <nestor barriolinux.es> writes:

I am a python programmer and I am enjoying Dlang and learning 
some programming insights on the way, thank everyone.

I have no formal education and also program JS and PHP.

Watching a video where a guy programs some simple code in Python 
and the same code in Go and compares speed I thought that could 
be some nice exercise for my learning path and successfully 
ported code to Dlang (hope so)

I was hoping to beat my dear Python and get similar results to 
Go, but that is not the case neither using rdmd nor running the 
executable generated by dmd. I am getting values between 350-380 
ms, and 81ms in Python.

1- I am doing something wrong in my code?
2- Do I have wrong expectations about Dlang?

Thanks in advance.

This is the video: https://www.youtube.com/watch?v=1Sban1F45jQ
This is my D code:
```
import std.stdio;
import std.random;
import std.datetime.stopwatch : benchmark, StopWatch, AutoStart;
import std.algorithm;

void main()
{
     auto sw = StopWatch(AutoStart.no);
     sw.start();
     int[] mylist;
     for (int number = 0; number < 100000; ++number)
     {
         auto rnd = Random(unpredictableSeed);
         auto n = uniform(0, 100, rnd);
         mylist ~= n;
     }
     mylist.sort();
     sw.stop();
     long msecs = sw.peek.total!"msecs";
     writefln("%s", msecs);
}
```

```
import time
import random

start = time.time()
mylist = []
for _ in range(100000):
     mylist.append(random.randint(0,100))
mylist.sort()
end = time.time()
print(f"{(end-start)*1000}ms")
```

Apr 01 2021

ag0aep6g <anonymous example.com> writes:

On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running the 
 executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.

[...]
 ```
     for (int number = 0; number < 100000; ++number)
     {
         auto rnd = Random(unpredictableSeed);
         auto n = uniform(0, 100, rnd);
         mylist ~= n;
     }
 ```

 ```
 for _ in range(100000):
     mylist.append(random.randint(0,100))
 ```

In the D version, you're re-seeding the random number generator 
on every loop. That takes time. You're not doing that in the 
Python version.

Move `auto rnd = ...;` out of the loop, and you will get better 
times. Or just use the default generator with `uniform(0, 100)`.

Apr 01 2021

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 4/1/21 10:15 AM, ag0aep6g wrote:

 Move `auto rnd = ...;` out of the loop, and you will get better times.

Doing that reduces the time about 15 fold.

Using Appender reduces it further a tiny bit:

import std.array;
// ...
     Appender!(int[]) mylist;
// ...
     mylist.data.sort();

Ali

Apr 01 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I am a python programmer and I am enjoying Dlang and learning 
 some programming insights on the way, thank everyone.

 [...]

Could you also post the python code for comparison?

Apr 01 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Thursday, 1 April 2021 at 17:16:06 UTC, Imperatorn wrote:
 On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I am a python programmer and I am enjoying Dlang and learning 
 some programming insights on the way, thank everyone.

 [...]

 Could you also post the python code for comparison?

Omg I totally missed it lol 😂

Apr 01 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I am a python programmer and I am enjoying Dlang and learning 
 some programming insights on the way, thank everyone.

 I have no formal education and also program JS and PHP.

 Watching a video where a guy programs some simple code in 
 Python and the same code in Go and compares speed I thought 
 that could be some nice exercise for my learning path and 
 successfully ported code to Dlang (hope so)

 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running the 
 executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.

 1- I am doing something wrong in my code?
 2- Do I have wrong expectations about Dlang?

 Thanks in advance.

 This is the video: https://www.youtube.com/watch?v=1Sban1F45jQ
 This is my D code:
 ```
 import std.stdio;
 import std.random;
 import std.datetime.stopwatch : benchmark, StopWatch, AutoStart;
 import std.algorithm;

 void main()
 {
     auto sw = StopWatch(AutoStart.no);
     sw.start();
     int[] mylist;
     for (int number = 0; number < 100000; ++number)
     {
         auto rnd = Random(unpredictableSeed);
         auto n = uniform(0, 100, rnd);
         mylist ~= n;
     }
     mylist.sort();
     sw.stop();
     long msecs = sw.peek.total!"msecs";
     writefln("%s", msecs);
 }
 ```

 ```
 import time
 import random

 start = time.time()
 mylist = []
 for _ in range(100000):
     mylist.append(random.randint(0,100))
 mylist.sort()
 end = time.time()
 print(f"{(end-start)*1000}ms")
 ```

Now when I actually read what you wrote I tested moving auto rnd 
= Random(unpredictableSeed) outside the loop and got 481 ms for 
the first version vs 34 ms for the other.

---
     auto sw = StopWatch(AutoStart.no);
     sw.start();
     int[] mylist;

     auto rnd = Random(unpredictableSeed);

     for (int number = 0; number < 100000; ++number)
     {
         auto n = uniform(0, 100, rnd);
         mylist ~= n;
     }

     mylist.sort();
     sw.stop();

     long msecs = sw.peek.total!"msecs";
     writefln("%s", msecs);
---

Also, one thing you could do is to parallelize for even faster 
performance. I can show you how to do that later if you want to.

Apr 01 2021

Chris Piker <chris hoopjump.com> writes:

On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running the 
 executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.

Nice test.  I'm new to D as well and can't comment on needed 
refactoring.   To confirm your results I compiled the D example 
using:

```
gdc -O2 speed.d -o speed
```

and measured 129 ms for the D program and 63 ms for the python3 
equivalent.

I'll be keen to see how this plays out since I'm using D as a 
faster alternative to python.

Apr 01 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Thursday, 1 April 2021 at 17:30:15 UTC, Chris Piker wrote:
 On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running 
 the executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.

 Nice test.  I'm new to D as well and can't comment on needed 
 refactoring.   To confirm your results I compiled the D example 
 using:

 ```
 gdc -O2 speed.d -o speed
 ```

 and measured 129 ms for the D program and 63 ms for the python3 
 equivalent.

 I'll be keen to see how this plays out since I'm using D as a 
 faster alternative to python.

If I make more changes in D I get below 10ms

Apr 01 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Thursday, 1 April 2021 at 17:30:15 UTC, Chris Piker wrote:
 On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running 
 the executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.

 Nice test.  I'm new to D as well and can't comment on needed 
 refactoring.   To confirm your results I compiled the D example 
 using:

 ```
 gdc -O2 speed.d -o speed
 ```

 and measured 129 ms for the D program and 63 ms for the python3 
 equivalent.

 I'll be keen to see how this plays out since I'm using D as a 
 faster alternative to python.

I have made 4 variants of the code and get:
20 ms (mylist ~= n and mylist.sort)
8 ms (mylist[number] = n and concurrent sort)
10 ms (parallel assignment and mylist.sort)
5 ms (parallel assignment and concurrent.sort)

Also make sure you build your application in release mode, this 
made quite some difference.

Apr 01 2021

Berni44 <someone somemail.com> writes:

On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running the 
 executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.

Try using ldc2 instead of dmd:

```
ldc2 -O3 -release -boundscheck=off -flto=full 
-defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
```

should produce much better results.

Apr 01 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Thursday, 1 April 2021 at 19:00:08 UTC, Berni44 wrote:
 On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running 
 the executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.

 Try using ldc2 instead of dmd:

 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```

 should produce much better results.

It did! Tried those flags with dmd and ldc and got the following 
(ms) for the approaches I had earlier (made two runs for each)

DMD
11
7
6
4

DMD
15
7
10
6

LDC
6
7
9
6

LDC
12
6
8
5

Apr 01 2021

matheus <matheus gmail.com> writes:

On Thursday, 1 April 2021 at 19:00:08 UTC, Berni44 wrote:
 Try using ldc2 instead of dmd:

 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```

 should produce much better results.

Since this is a "Learn" part of the Foruam, be careful with 
"-boundscheck=off".

I mean for this little snippet is OK, but for a other projects 
this my be wrong, and as it says here: 
https://dlang.org/dmd-windows.html#switch-boundscheck

"This option should be used with caution and as a last resort to 
improve performance. Confirm turning off  safe bounds checks is 
worthwhile by benchmarking."

Matheus.

Apr 01 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Apr 01, 2021 at 07:25:53PM +0000, matheus via Digitalmars-d-learn wrote:
[...]
 Since this is a "Learn" part of the Foruam, be careful with
 "-boundscheck=off".
 
 I mean for this little snippet is OK, but for a other projects this my
 be wrong, and as it says here:
 https://dlang.org/dmd-windows.html#switch-boundscheck
 
 "This option should be used with caution and as a last resort to
 improve performance. Confirm turning off  safe bounds checks is
 worthwhile by benchmarking."

[...]

It's interesting that whenever a question about D's performance pops up
in the forums, people tend to reach for optimization flags.  I wouldn't
say it doesn't help; but I've found that significant performance
improvements can usually be obtained by examining the code first, and
catching common newbie mistakes.  Those usually account for the majority
of the observed performance degradation.

Only after the code has been cleaned up and obvious mistakes fixed, is
it worth reaching for optimization flags, IMO.

Common mistakes I've noticed include:

- Constructing large arrays by appending 1 element at a time with `~`.
  Obviously, this requires many array reallocations and the associated
  copying; not to mention greatly-increased GC load that could have been
  easily avoided by preallocation or using std.array.appender.

- Failing to move repeated computations (esp. inefficient ones) outside
  the inner loop.  Sometimes a good optimizing compiler is able to hoist
  it out automatically, but not always.

- Constructing lots of temporaries in inner loops as heap-allocated
  classes instead of by-value structs: the former leads to heavy GC
  load, not to mention memory allocation is generally slow and should be
  avoided inside inner loops. Heap-allocated objects also require
  indirections, which slow things down even more. The latter can be
  passed around in registers: no GC pressure, no indirections; so can
  significantly improve performance.

- Using O(N^2) (or other super-linear) algorithms with large data sets
  where a more efficient algorithm is available. This one ought to speak
  for itself. :-D  Nevertheless it still crops up from time to time, so
  deserves to be mentioned again.


T

-- 
Those who don't understand Unix are condemned to reinvent it, poorly.

Apr 01 2021

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 4/1/21 12:55 PM, H. S. Teoh wrote:

 - Constructing large arrays by appending 1 element at a time with `~`.
    Obviously, this requires many array reallocations and the associated
    copying

And that may not be a contributing factor. :) The following program sees 
just 15 allocations and 1722 element copies for 1 million appending 
operations:

import std.stdio;

void main() {
   int[] arr;
   auto place = arr.ptr;
   size_t relocated = 0;
   size_t copied = 0;
   foreach (i; 0 .. 1_000_000) {
     arr ~= i;
     if (arr.ptr != place) {
       ++relocated;
       copied += arr.length - 1;
       place = arr.ptr;
     }
   }

   writeln("relocated: ", relocated);
   writeln("copied   : ", copied);
}

This is because the GC does not allocate if there are unused pages right 
after the array. (However, increasing the element count to 10 million 
increases allocations slightly to 18 but element copies jump to 8 million.)

Ali

Apr 01 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Apr 01, 2021 at 01:17:15PM -0700, Ali �ehreli via Digitalmars-d-learn
wrote:
 On 4/1/21 12:55 PM, H. S. Teoh wrote:
 
 - Constructing large arrays by appending 1 element at a time with
 `~`.  Obviously, this requires many array reallocations and the
 associated copying

 
 And that may not be a contributing factor. :) The following program
 sees just 15 allocations and 1722 element copies for 1 million
 appending operations:

[...]
 This is because the GC does not allocate if there are unused pages
 right after the array.

Right, but in a typical program it's unpredictable whether there will be
unused pages after the array.


 (However, increasing the element count to 10 million increases
 allocations slightly to 18 but element copies jump to 8 million.)

[...]

Thanks for the very interesting information; so it looks like most of
the time spent is actually in copying array elements than anything else!


T

-- 
Let's eat some disquits while we format the biskettes.

Apr 01 2021

Imperatorn <johan_forsberg_86 hotmail.com> writes:

On Thursday, 1 April 2021 at 21:13:18 UTC, H. S. Teoh wrote:
 On Thu, Apr 01, 2021 at 01:17:15PM -0700, Ali Çehreli via 
 Digitalmars-d-learn wrote:
 [...]

 [...]
 [...]

 Right, but in a typical program it's unpredictable whether 
 there will be unused pages after the array.


 [...]

 [...]

 Thanks for the very interesting information; so it looks like 
 most of the time spent is actually in copying array elements 
 than anything else!


 T

Sorting takes longer proportionally though

Apr 01 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Apr 01, 2021 at 09:16:09PM +0000, Imperatorn via Digitalmars-d-learn
wrote:
 On Thursday, 1 April 2021 at 21:13:18 UTC, H. S. Teoh wrote:

[...]
 Thanks for the very interesting information; so it looks like most
 of the time spent is actually in copying array elements than
 anything else!


[...]
 Sorting takes longer proportionally though

I meant that most the time incurred by appending to the array element by
element is spent copying elements.

Obviously, sorting will not be as fast as copying array elements.


T

-- 
What do you mean the Internet isn't filled with subliminal messages? What about
all those buttons marked "submit"??

Apr 01 2021

Jon Degenhardt <jond noreply.com> writes:

On Thursday, 1 April 2021 at 19:55:05 UTC, H. S. Teoh wrote:
 On Thu, Apr 01, 2021 at 07:25:53PM +0000, matheus via 
 Digitalmars-d-learn wrote: [...]
 Since this is a "Learn" part of the Foruam, be careful with 
 "-boundscheck=off".
 
 I mean for this little snippet is OK, but for a other projects 
 this my be wrong, and as it says here: 
 https://dlang.org/dmd-windows.html#switch-boundscheck
 
 "This option should be used with caution and as a last resort 
 to improve performance. Confirm turning off  safe bounds 
 checks is worthwhile by benchmarking."

 [...]

 It's interesting that whenever a question about D's performance 
 pops up in the forums, people tend to reach for optimization 
 flags.  I wouldn't say it doesn't help; but I've found that 
 significant performance improvements can usually be obtained by 
 examining the code first, and catching common newbie mistakes.  
 Those usually account for the majority of the observed 
 performance degradation.

 Only after the code has been cleaned up and obvious mistakes 
 fixed, is it worth reaching for optimization flags, IMO.

This is my experience as well, and not just for D. Pick good 
algorithms and pay attention to memory allocation. Don't go crazy 
on the latter. Many people try to avoid GC at all costs, but I 
don't usually find it necessary to go quite that far. Very often 
simply reusing already allocated memory does the trick. The blog 
post I wrote a few years ago focuses on these ideas: 
https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

--Jon

Apr 01 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Apr 02, 2021 at 02:36:21AM +0000, Jon Degenhardt via
Digitalmars-d-learn wrote:
 On Thursday, 1 April 2021 at 19:55:05 UTC, H. S. Teoh wrote:

[...]
 It's interesting that whenever a question about D's performance pops
 up in the forums, people tend to reach for optimization flags.  I
 wouldn't say it doesn't help; but I've found that significant
 performance improvements can usually be obtained by examining the
 code first, and catching common newbie mistakes.  Those usually
 account for the majority of the observed performance degradation.
 
 Only after the code has been cleaned up and obvious mistakes fixed,
 is it worth reaching for optimization flags, IMO.

 
 This is my experience as well, and not just for D. Pick good
 algorithms and pay attention to memory allocation. Don't go crazy on
 the latter. Many people try to avoid GC at all costs, but I don't
 usually find it necessary to go quite that far. Very often simply
 reusing already allocated memory does the trick.

I've been saying this for years, the GC is (usually) not evil. It's
often quite easy to optimize away the main bottlenecks and any remaining
problem becomes not so important anymore.

For example, see this thread:

	https://forum.dlang.org/post/mailman.1589.1415314819.9932.digitalmars-d puremagic.com

which is continued here (for some reason it was split -- the bad ole
Mailman bug, IIRC):

	https://forum.dlang.org/post/mailman.1590.1415315739.9932.digitalmars-d puremagic.com


From a starting point of about 20 seconds total running time, I reduced

it to about 6 seconds by the following fixes:

1) Reduce GC collection frequency: call GC.stop at start of program,
   then manually call GC.collect periodically.

2) Eliminate autodecoding (using .representation or .byChar).

3) Rewrite a hot inner loop using pointers instead of .countUntil.

4) Refactor the code to eliminate a redundant computation from an inner
   loop.

5) Judicious use of .assumeSafeAppend to prevent excessive array
   reallocations.

6) (Not described in the thread, but applied later) Reduce GC load even
   further by reusing an array that was being allocated per iteration in
   an inner loop before.

Of the above, (1), (2), (3), and (5) require only very small code
changes. (4) and (6) were a little more tricky, but were pretty
localised changes that did not take a long time to implement or affect a
lot of code.  They were all implemented in a short span of 2-3 days.

Compare this with outright writing  nogc code, which would require a LOT
more time & effort.


 The blog post I wrote a few years ago focuses on these ideas:
 https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/

Very nice, and matches my experience with optimizing D code.


T

-- 
Век живи - век учись. А дураком помрёшь.

Apr 01 2021

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 4/1/21 9:01 PM, H. S. Teoh wrote:

 6) (Not described in the thread, but applied later) Reduce GC load even
     further by reusing an array that was being allocated per iteration in
     an inner loop before.

For those who prefer a video description with some accent :) here is how 
to apply that technique:

   https://www.youtube.com/watch?v=dRORNQIB2wA&t=787s

And here is how to profile[1] the program to see where the allocations 
occur:

   https://www.youtube.com/watch?v=dRORNQIB2wA&t=630s

Ali

[1] Unfortunately, the profiler has some bugs, which cause segmentation 
faults in some cases.

Apr 02 2021

drug <drug2004 bk.ru> writes:

02.04.2021 15:06, Ali Çehreli пишет:
 
 For those who prefer a video description with some accent :) here is how 

What about accent - I'm curious what would you say about this old 
Russian sketch about English and its dialects (in English, no facebook 
account required):

https://www.facebook.com/ABBYY.Lingvo/videos/954190547976075

skip first 50 seconds up to English part

Apr 02 2021

ag0aep6g <anonymous example.com> writes:

On 01.04.21 21:00, Berni44 wrote:
 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```

Please don't recommend `-boundscheck=off` to newbies. It's not just an 
optimization. It breaks  safe. If you want to do welding without eye 
protection, that's on you. But please don't recommend it to the new guy.

Apr 01 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/1/21 3:27 PM, ag0aep6g wrote:
 On 01.04.21 21:00, Berni44 wrote:
 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```

 
 Please don't recommend `-boundscheck=off` to newbies. It's not just an 
 optimization. It breaks  safe. If you want to do welding without eye 
 protection, that's on you. But please don't recommend it to the new guy.

Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on 
for  safe code.

though I personally leave it on for everything.

-Steve

Apr 01 2021

ag0aep6g <anonymous example.com> writes:

On 01.04.21 21:36, Steven Schveighoffer wrote:
 On 4/1/21 3:27 PM, ag0aep6g wrote:
 On 01.04.21 21:00, Berni44 wrote:
 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```



[...]
 Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on 
 for  safe code.

`-O -release` already does that, doesn't it?

Apr 01 2021

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/1/21 3:44 PM, ag0aep6g wrote:
 On 01.04.21 21:36, Steven Schveighoffer wrote:
 On 4/1/21 3:27 PM, ag0aep6g wrote:
 On 01.04.21 21:00, Berni44 wrote:
 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```



 [...]
 Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on 
 for  safe code.

 `-O -release` already does that, doesn't it?

Maybe, but I wasn't responding to that, just your statement not to 
recommend -boundscheck=off. In any case, it wouldn't hurt, right?

I don't know what -O3 and -release do on ldc.

-Steve

Apr 01 2021

ag0aep6g <anonymous example.com> writes:

On 01.04.21 21:53, Steven Schveighoffer wrote:
 Maybe, but I wasn't responding to that, just your statement not to 
 recommend -boundscheck=off. In any case, it wouldn't hurt, right?

Right.

Apr 01 2021

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Apr 01, 2021 at 04:52:17PM +0000, Nestor via Digitalmars-d-learn wrote:
[...]
 ```
 import std.stdio;
 import std.random;
 import std.datetime.stopwatch : benchmark, StopWatch, AutoStart;
 import std.algorithm;
 
 void main()
 {
     auto sw = StopWatch(AutoStart.no);
     sw.start();
     int[] mylist;

Since the length of the array is already known beforehand, you could get
significant speedups by preallocating the array:

	int[] mylist = new int[100000];
	for (int number ...)
	{
		...
		mylist[number] = n;
	}


     for (int number = 0; number < 100000; ++number)
     {
         auto rnd = Random(unpredictableSeed);

[...]

Don't reseed the RNG every loop iteration. (1) It's very inefficient and
slow, and (2) it actually makes it *less* random than if you seeded it
only once at the start of the program.  Move this outside the loop, and
you should see some gains.


         auto n = uniform(0, 100, rnd);
         mylist ~= n;
     }
     mylist.sort();
     sw.stop();
     long msecs = sw.peek.total!"msecs";
     writefln("%s", msecs);
 }

[...]
 ```

Also, whenever performance matters, use gdc or ldc2 instead of dmd. Try
`ldc2 -O2`, for example.


I did a quick test with LDC, with a side-by-side comparison of your
original version and my improved version:

-------------
import std.stdio;
import std.random;
import std.datetime.stopwatch : benchmark, StopWatch, AutoStart;
import std.algorithm;

void original()
{
    auto sw = StopWatch(AutoStart.no);
    sw.start();
    int[] mylist;
    for (int number = 0; number < 100000; ++number)
    {
        auto rnd = Random(unpredictableSeed);
        auto n = uniform(0, 100, rnd);
        mylist ~= n;
    }
    mylist.sort();
    sw.stop();
    long msecs = sw.peek.total!"msecs";
    writefln("%s", msecs);
}

void improved()
{
    auto sw = StopWatch(AutoStart.no);
    sw.start();
    int[] mylist = new int[100000];
    auto rnd = Random(unpredictableSeed);
    for (int number = 0; number < 100000; ++number)
    {
        auto n = uniform(0, 100, rnd);
        mylist[number] = n;
    }
    mylist.sort();
    sw.stop();
    long msecs = sw.peek.total!"msecs";
    writefln("%s", msecs);
}

void main()
{
    original();
    improved();
}
-------------


Here's the typical output:
-------------
209
5
-------------

As you can see, that's a 40x improvement in speed. ;-)

Assuming that the ~209 msec on my PC corresponds with your observed
280ms, and assuming that the 40x improvement will also apply on your
machine, the improved version should run in about 9-10 msec.  So this
*should* have give you a 4x speedup over the Python version, in theory.
I'd love to see how it actually measures on your machine, if you don't
mind. ;-)


T

-- 
Holding a grudge is like drinking poison and hoping the other person dies. --
seen on the 'Net

Apr 01 2021

Nestor <nestor barriolinux.es> writes:

On Thursday, 1 April 2021 at 19:38:39 UTC, H. S. Teoh wrote:
 On Thu, Apr 01, 2021 at 04:52:17PM +0000, Nestor via 
 Digitalmars-d-learn wrote: [...]
     [...]

 Since the length of the array is already known beforehand, you 
 could get significant speedups by preallocating the array:

 [...]

First, thanks everyone!

I don't have ldc2 installed so I skipped those suggestions.

I always suspected I was doing something wrong with the random 
generator. Some how in a test if I put the seed outside the loop 
I got the same (not)random number but might be a copy-paste error 
from my side.

Reserved length in an integers list is something I am starting to 
give great value, thanks for pointing it out.

I feel satisfied with 30-12ms using rdmd I am getting now :)

Thanks again Ali, Teoh, Steven, ag0aep6g, Imperatorn

Apr 02 2021

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Need for speed