www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Need for speed

reply Nestor <nestor barriolinux.es> writes:
I am a python programmer and I am enjoying Dlang and learning 
some programming insights on the way, thank everyone.

I have no formal education and also program JS and PHP.

Watching a video where a guy programs some simple code in Python 
and the same code in Go and compares speed I thought that could 
be some nice exercise for my learning path and successfully 
ported code to Dlang (hope so)

I was hoping to beat my dear Python and get similar results to 
Go, but that is not the case neither using rdmd nor running the 
executable generated by dmd. I am getting values between 350-380 
ms, and 81ms in Python.

1- I am doing something wrong in my code?
2- Do I have wrong expectations about Dlang?

Thanks in advance.

This is the video: https://www.youtube.com/watch?v=1Sban1F45jQ
This is my D code:
```
import std.stdio;
import std.random;
import std.datetime.stopwatch : benchmark, StopWatch, AutoStart;
import std.algorithm;

void main()
{
     auto sw = StopWatch(AutoStart.no);
     sw.start();
     int[] mylist;
     for (int number = 0; number < 100000; ++number)
     {
         auto rnd = Random(unpredictableSeed);
         auto n = uniform(0, 100, rnd);
         mylist ~= n;
     }
     mylist.sort();
     sw.stop();
     long msecs = sw.peek.total!"msecs";
     writefln("%s", msecs);
}
```

```
import time
import random

start = time.time()
mylist = []
for _ in range(100000):
     mylist.append(random.randint(0,100))
mylist.sort()
end = time.time()
print(f"{(end-start)*1000}ms")
```
Apr 01
next sibling parent reply ag0aep6g <anonymous example.com> writes:
On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running the 
 executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.
[...]
 ```
     for (int number = 0; number < 100000; ++number)
     {
         auto rnd = Random(unpredictableSeed);
         auto n = uniform(0, 100, rnd);
         mylist ~= n;
     }
 ```

 ```
 for _ in range(100000):
     mylist.append(random.randint(0,100))
 ```
In the D version, you're re-seeding the random number generator on every loop. That takes time. You're not doing that in the Python version. Move `auto rnd = ...;` out of the loop, and you will get better times. Or just use the default generator with `uniform(0, 100)`.
Apr 01
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/21 10:15 AM, ag0aep6g wrote:

 Move `auto rnd = ...;` out of the loop, and you will get better times.
Doing that reduces the time about 15 fold. Using Appender reduces it further a tiny bit: import std.array; // ... Appender!(int[]) mylist; // ... mylist.data.sort(); Ali
Apr 01
prev sibling next sibling parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I am a python programmer and I am enjoying Dlang and learning 
 some programming insights on the way, thank everyone.

 [...]
Could you also post the python code for comparison?
Apr 01
parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 1 April 2021 at 17:16:06 UTC, Imperatorn wrote:
 On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I am a python programmer and I am enjoying Dlang and learning 
 some programming insights on the way, thank everyone.

 [...]
Could you also post the python code for comparison?
Omg I totally missed it lol 😂
Apr 01
prev sibling next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I am a python programmer and I am enjoying Dlang and learning 
 some programming insights on the way, thank everyone.

 I have no formal education and also program JS and PHP.

 Watching a video where a guy programs some simple code in 
 Python and the same code in Go and compares speed I thought 
 that could be some nice exercise for my learning path and 
 successfully ported code to Dlang (hope so)

 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running the 
 executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.

 1- I am doing something wrong in my code?
 2- Do I have wrong expectations about Dlang?

 Thanks in advance.

 This is the video: https://www.youtube.com/watch?v=1Sban1F45jQ
 This is my D code:
 ```
 import std.stdio;
 import std.random;
 import std.datetime.stopwatch : benchmark, StopWatch, AutoStart;
 import std.algorithm;

 void main()
 {
     auto sw = StopWatch(AutoStart.no);
     sw.start();
     int[] mylist;
     for (int number = 0; number < 100000; ++number)
     {
         auto rnd = Random(unpredictableSeed);
         auto n = uniform(0, 100, rnd);
         mylist ~= n;
     }
     mylist.sort();
     sw.stop();
     long msecs = sw.peek.total!"msecs";
     writefln("%s", msecs);
 }
 ```

 ```
 import time
 import random

 start = time.time()
 mylist = []
 for _ in range(100000):
     mylist.append(random.randint(0,100))
 mylist.sort()
 end = time.time()
 print(f"{(end-start)*1000}ms")
 ```
Now when I actually read what you wrote I tested moving auto rnd = Random(unpredictableSeed) outside the loop and got 481 ms for the first version vs 34 ms for the other. --- auto sw = StopWatch(AutoStart.no); sw.start(); int[] mylist; auto rnd = Random(unpredictableSeed); for (int number = 0; number < 100000; ++number) { auto n = uniform(0, 100, rnd); mylist ~= n; } mylist.sort(); sw.stop(); long msecs = sw.peek.total!"msecs"; writefln("%s", msecs); --- Also, one thing you could do is to parallelize for even faster performance. I can show you how to do that later if you want to.
Apr 01
prev sibling next sibling parent reply Chris Piker <chris hoopjump.com> writes:
On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running the 
 executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.
Nice test. I'm new to D as well and can't comment on needed refactoring. To confirm your results I compiled the D example using: ``` gdc -O2 speed.d -o speed ``` and measured 129 ms for the D program and 63 ms for the python3 equivalent. I'll be keen to see how this plays out since I'm using D as a faster alternative to python.
Apr 01
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 1 April 2021 at 17:30:15 UTC, Chris Piker wrote:
 On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running 
 the executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.
Nice test. I'm new to D as well and can't comment on needed refactoring. To confirm your results I compiled the D example using: ``` gdc -O2 speed.d -o speed ``` and measured 129 ms for the D program and 63 ms for the python3 equivalent. I'll be keen to see how this plays out since I'm using D as a faster alternative to python.
If I make more changes in D I get below 10ms
Apr 01
prev sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 1 April 2021 at 17:30:15 UTC, Chris Piker wrote:
 On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running 
 the executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.
Nice test. I'm new to D as well and can't comment on needed refactoring. To confirm your results I compiled the D example using: ``` gdc -O2 speed.d -o speed ``` and measured 129 ms for the D program and 63 ms for the python3 equivalent. I'll be keen to see how this plays out since I'm using D as a faster alternative to python.
I have made 4 variants of the code and get: 20 ms (mylist ~= n and mylist.sort) 8 ms (mylist[number] = n and concurrent sort) 10 ms (parallel assignment and mylist.sort) 5 ms (parallel assignment and concurrent.sort) Also make sure you build your application in release mode, this made quite some difference.
Apr 01
prev sibling next sibling parent reply Berni44 <someone somemail.com> writes:
On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running the 
 executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.
Try using ldc2 instead of dmd: ``` ldc2 -O3 -release -boundscheck=off -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d ``` should produce much better results.
Apr 01
next sibling parent Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 1 April 2021 at 19:00:08 UTC, Berni44 wrote:
 On Thursday, 1 April 2021 at 16:52:17 UTC, Nestor wrote:
 I was hoping to beat my dear Python and get similar results to 
 Go, but that is not the case neither using rdmd nor running 
 the executable generated by dmd. I am getting values between 
 350-380 ms, and 81ms in Python.
Try using ldc2 instead of dmd: ``` ldc2 -O3 -release -boundscheck=off -flto=full -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d ``` should produce much better results.
It did! Tried those flags with dmd and ldc and got the following (ms) for the approaches I had earlier (made two runs for each) DMD 11 7 6 4 DMD 15 7 10 6 LDC 6 7 9 6 LDC 12 6 8 5
Apr 01
prev sibling next sibling parent reply matheus <matheus gmail.com> writes:
On Thursday, 1 April 2021 at 19:00:08 UTC, Berni44 wrote:
 Try using ldc2 instead of dmd:

 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```

 should produce much better results.
Since this is a "Learn" part of the Foruam, be careful with "-boundscheck=off". I mean for this little snippet is OK, but for a other projects this my be wrong, and as it says here: https://dlang.org/dmd-windows.html#switch-boundscheck "This option should be used with caution and as a last resort to improve performance. Confirm turning off safe bounds checks is worthwhile by benchmarking." Matheus.
Apr 01
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Apr 01, 2021 at 07:25:53PM +0000, matheus via Digitalmars-d-learn wrote:
[...]
 Since this is a "Learn" part of the Foruam, be careful with
 "-boundscheck=off".
 
 I mean for this little snippet is OK, but for a other projects this my
 be wrong, and as it says here:
 https://dlang.org/dmd-windows.html#switch-boundscheck
 
 "This option should be used with caution and as a last resort to
 improve performance. Confirm turning off  safe bounds checks is
 worthwhile by benchmarking."
[...] It's interesting that whenever a question about D's performance pops up in the forums, people tend to reach for optimization flags. I wouldn't say it doesn't help; but I've found that significant performance improvements can usually be obtained by examining the code first, and catching common newbie mistakes. Those usually account for the majority of the observed performance degradation. Only after the code has been cleaned up and obvious mistakes fixed, is it worth reaching for optimization flags, IMO. Common mistakes I've noticed include: - Constructing large arrays by appending 1 element at a time with `~`. Obviously, this requires many array reallocations and the associated copying; not to mention greatly-increased GC load that could have been easily avoided by preallocation or using std.array.appender. - Failing to move repeated computations (esp. inefficient ones) outside the inner loop. Sometimes a good optimizing compiler is able to hoist it out automatically, but not always. - Constructing lots of temporaries in inner loops as heap-allocated classes instead of by-value structs: the former leads to heavy GC load, not to mention memory allocation is generally slow and should be avoided inside inner loops. Heap-allocated objects also require indirections, which slow things down even more. The latter can be passed around in registers: no GC pressure, no indirections; so can significantly improve performance. - Using O(N^2) (or other super-linear) algorithms with large data sets where a more efficient algorithm is available. This one ought to speak for itself. :-D Nevertheless it still crops up from time to time, so deserves to be mentioned again. T -- Those who don't understand Unix are condemned to reinvent it, poorly.
Apr 01
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/21 12:55 PM, H. S. Teoh wrote:

 - Constructing large arrays by appending 1 element at a time with `~`.
    Obviously, this requires many array reallocations and the associated
    copying
And that may not be a contributing factor. :) The following program sees just 15 allocations and 1722 element copies for 1 million appending operations: import std.stdio; void main() { int[] arr; auto place = arr.ptr; size_t relocated = 0; size_t copied = 0; foreach (i; 0 .. 1_000_000) { arr ~= i; if (arr.ptr != place) { ++relocated; copied += arr.length - 1; place = arr.ptr; } } writeln("relocated: ", relocated); writeln("copied : ", copied); } This is because the GC does not allocate if there are unused pages right after the array. (However, increasing the element count to 10 million increases allocations slightly to 18 but element copies jump to 8 million.) Ali
Apr 01
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Apr 01, 2021 at 01:17:15PM -0700, Ali ehreli via Digitalmars-d-learn
wrote:
 On 4/1/21 12:55 PM, H. S. Teoh wrote:
 
 - Constructing large arrays by appending 1 element at a time with
 `~`.  Obviously, this requires many array reallocations and the
 associated copying
And that may not be a contributing factor. :) The following program sees just 15 allocations and 1722 element copies for 1 million appending operations:
[...]
 This is because the GC does not allocate if there are unused pages
 right after the array.
Right, but in a typical program it's unpredictable whether there will be unused pages after the array.
 (However, increasing the element count to 10 million increases
 allocations slightly to 18 but element copies jump to 8 million.)
[...] Thanks for the very interesting information; so it looks like most of the time spent is actually in copying array elements than anything else! T -- Let's eat some disquits while we format the biskettes.
Apr 01
parent reply Imperatorn <johan_forsberg_86 hotmail.com> writes:
On Thursday, 1 April 2021 at 21:13:18 UTC, H. S. Teoh wrote:
 On Thu, Apr 01, 2021 at 01:17:15PM -0700, Ali Çehreli via 
 Digitalmars-d-learn wrote:
 [...]
[...]
 [...]
Right, but in a typical program it's unpredictable whether there will be unused pages after the array.
 [...]
[...] Thanks for the very interesting information; so it looks like most of the time spent is actually in copying array elements than anything else! T
Sorting takes longer proportionally though
Apr 01
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Apr 01, 2021 at 09:16:09PM +0000, Imperatorn via Digitalmars-d-learn
wrote:
 On Thursday, 1 April 2021 at 21:13:18 UTC, H. S. Teoh wrote:
[...]
 Thanks for the very interesting information; so it looks like most
 of the time spent is actually in copying array elements than
 anything else!
[...]
 Sorting takes longer proportionally though
I meant that most the time incurred by appending to the array element by element is spent copying elements. Obviously, sorting will not be as fast as copying array elements. T -- What do you mean the Internet isn't filled with subliminal messages? What about all those buttons marked "submit"??
Apr 01
prev sibling parent reply Jon Degenhardt <jond noreply.com> writes:
On Thursday, 1 April 2021 at 19:55:05 UTC, H. S. Teoh wrote:
 On Thu, Apr 01, 2021 at 07:25:53PM +0000, matheus via 
 Digitalmars-d-learn wrote: [...]
 Since this is a "Learn" part of the Foruam, be careful with 
 "-boundscheck=off".
 
 I mean for this little snippet is OK, but for a other projects 
 this my be wrong, and as it says here: 
 https://dlang.org/dmd-windows.html#switch-boundscheck
 
 "This option should be used with caution and as a last resort 
 to improve performance. Confirm turning off  safe bounds 
 checks is worthwhile by benchmarking."
[...] It's interesting that whenever a question about D's performance pops up in the forums, people tend to reach for optimization flags. I wouldn't say it doesn't help; but I've found that significant performance improvements can usually be obtained by examining the code first, and catching common newbie mistakes. Those usually account for the majority of the observed performance degradation. Only after the code has been cleaned up and obvious mistakes fixed, is it worth reaching for optimization flags, IMO.
This is my experience as well, and not just for D. Pick good algorithms and pay attention to memory allocation. Don't go crazy on the latter. Many people try to avoid GC at all costs, but I don't usually find it necessary to go quite that far. Very often simply reusing already allocated memory does the trick. The blog post I wrote a few years ago focuses on these ideas: https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/ --Jon
Apr 01
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Apr 02, 2021 at 02:36:21AM +0000, Jon Degenhardt via
Digitalmars-d-learn wrote:
 On Thursday, 1 April 2021 at 19:55:05 UTC, H. S. Teoh wrote:
[...]
 It's interesting that whenever a question about D's performance pops
 up in the forums, people tend to reach for optimization flags.  I
 wouldn't say it doesn't help; but I've found that significant
 performance improvements can usually be obtained by examining the
 code first, and catching common newbie mistakes.  Those usually
 account for the majority of the observed performance degradation.
 
 Only after the code has been cleaned up and obvious mistakes fixed,
 is it worth reaching for optimization flags, IMO.
This is my experience as well, and not just for D. Pick good algorithms and pay attention to memory allocation. Don't go crazy on the latter. Many people try to avoid GC at all costs, but I don't usually find it necessary to go quite that far. Very often simply reusing already allocated memory does the trick.
I've been saying this for years, the GC is (usually) not evil. It's often quite easy to optimize away the main bottlenecks and any remaining problem becomes not so important anymore. For example, see this thread: https://forum.dlang.org/post/mailman.1589.1415314819.9932.digitalmars-d puremagic.com which is continued here (for some reason it was split -- the bad ole Mailman bug, IIRC): https://forum.dlang.org/post/mailman.1590.1415315739.9932.digitalmars-d puremagic.com
From a starting point of about 20 seconds total running time, I reduced
it to about 6 seconds by the following fixes: 1) Reduce GC collection frequency: call GC.stop at start of program, then manually call GC.collect periodically. 2) Eliminate autodecoding (using .representation or .byChar). 3) Rewrite a hot inner loop using pointers instead of .countUntil. 4) Refactor the code to eliminate a redundant computation from an inner loop. 5) Judicious use of .assumeSafeAppend to prevent excessive array reallocations. 6) (Not described in the thread, but applied later) Reduce GC load even further by reusing an array that was being allocated per iteration in an inner loop before. Of the above, (1), (2), (3), and (5) require only very small code changes. (4) and (6) were a little more tricky, but were pretty localised changes that did not take a long time to implement or affect a lot of code. They were all implemented in a short span of 2-3 days. Compare this with outright writing nogc code, which would require a LOT more time & effort.
 The blog post I wrote a few years ago focuses on these ideas:
 https://dlang.org/blog/2017/05/24/faster-command-line-tools-in-d/
Very nice, and matches my experience with optimizing D code. T -- Век живи - век учись. А дураком помрёшь.
Apr 01
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/1/21 9:01 PM, H. S. Teoh wrote:

 6) (Not described in the thread, but applied later) Reduce GC load even
     further by reusing an array that was being allocated per iteration in
     an inner loop before.
For those who prefer a video description with some accent :) here is how to apply that technique: https://www.youtube.com/watch?v=dRORNQIB2wA&t=787s And here is how to profile[1] the program to see where the allocations occur: https://www.youtube.com/watch?v=dRORNQIB2wA&t=630s Ali [1] Unfortunately, the profiler has some bugs, which cause segmentation faults in some cases.
Apr 02
parent drug <drug2004 bk.ru> writes:
02.04.2021 15:06, Ali Çehreli пишет:
 
 For those who prefer a video description with some accent :) here is how 
What about accent - I'm curious what would you say about this old Russian sketch about English and its dialects (in English, no facebook account required): https://www.facebook.com/ABBYY.Lingvo/videos/954190547976075 skip first 50 seconds up to English part
Apr 02
prev sibling parent reply ag0aep6g <anonymous example.com> writes:
On 01.04.21 21:00, Berni44 wrote:
 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```
Please don't recommend `-boundscheck=off` to newbies. It's not just an optimization. It breaks safe. If you want to do welding without eye protection, that's on you. But please don't recommend it to the new guy.
Apr 01
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/21 3:27 PM, ag0aep6g wrote:
 On 01.04.21 21:00, Berni44 wrote:
 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```
Please don't recommend `-boundscheck=off` to newbies. It's not just an optimization. It breaks safe. If you want to do welding without eye protection, that's on you. But please don't recommend it to the new guy.
Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on for safe code. though I personally leave it on for everything. -Steve
Apr 01
parent reply ag0aep6g <anonymous example.com> writes:
On 01.04.21 21:36, Steven Schveighoffer wrote:
 On 4/1/21 3:27 PM, ag0aep6g wrote:
 On 01.04.21 21:00, Berni44 wrote:
 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```
[...]
 Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on 
 for  safe code.
`-O -release` already does that, doesn't it?
Apr 01
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/1/21 3:44 PM, ag0aep6g wrote:
 On 01.04.21 21:36, Steven Schveighoffer wrote:
 On 4/1/21 3:27 PM, ag0aep6g wrote:
 On 01.04.21 21:00, Berni44 wrote:
 ```
 ldc2 -O3 -release -boundscheck=off -flto=full 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto speed.d
 ```
[...]
 Yes, but you can recommend `-boundscheck=safeonly`, which leaves it on 
 for  safe code.
`-O -release` already does that, doesn't it?
Maybe, but I wasn't responding to that, just your statement not to recommend -boundscheck=off. In any case, it wouldn't hurt, right? I don't know what -O3 and -release do on ldc. -Steve
Apr 01
parent ag0aep6g <anonymous example.com> writes:
On 01.04.21 21:53, Steven Schveighoffer wrote:
 Maybe, but I wasn't responding to that, just your statement not to 
 recommend -boundscheck=off. In any case, it wouldn't hurt, right?
Right.
Apr 01
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Apr 01, 2021 at 04:52:17PM +0000, Nestor via Digitalmars-d-learn wrote:
[...]
 ```
 import std.stdio;
 import std.random;
 import std.datetime.stopwatch : benchmark, StopWatch, AutoStart;
 import std.algorithm;
 
 void main()
 {
     auto sw = StopWatch(AutoStart.no);
     sw.start();
     int[] mylist;
Since the length of the array is already known beforehand, you could get significant speedups by preallocating the array: int[] mylist = new int[100000]; for (int number ...) { ... mylist[number] = n; }
     for (int number = 0; number < 100000; ++number)
     {
         auto rnd = Random(unpredictableSeed);
[...] Don't reseed the RNG every loop iteration. (1) It's very inefficient and slow, and (2) it actually makes it *less* random than if you seeded it only once at the start of the program. Move this outside the loop, and you should see some gains.
         auto n = uniform(0, 100, rnd);
         mylist ~= n;
     }
     mylist.sort();
     sw.stop();
     long msecs = sw.peek.total!"msecs";
     writefln("%s", msecs);
 }
[...]
 ```
Also, whenever performance matters, use gdc or ldc2 instead of dmd. Try `ldc2 -O2`, for example. I did a quick test with LDC, with a side-by-side comparison of your original version and my improved version: ------------- import std.stdio; import std.random; import std.datetime.stopwatch : benchmark, StopWatch, AutoStart; import std.algorithm; void original() { auto sw = StopWatch(AutoStart.no); sw.start(); int[] mylist; for (int number = 0; number < 100000; ++number) { auto rnd = Random(unpredictableSeed); auto n = uniform(0, 100, rnd); mylist ~= n; } mylist.sort(); sw.stop(); long msecs = sw.peek.total!"msecs"; writefln("%s", msecs); } void improved() { auto sw = StopWatch(AutoStart.no); sw.start(); int[] mylist = new int[100000]; auto rnd = Random(unpredictableSeed); for (int number = 0; number < 100000; ++number) { auto n = uniform(0, 100, rnd); mylist[number] = n; } mylist.sort(); sw.stop(); long msecs = sw.peek.total!"msecs"; writefln("%s", msecs); } void main() { original(); improved(); } ------------- Here's the typical output: ------------- 209 5 ------------- As you can see, that's a 40x improvement in speed. ;-) Assuming that the ~209 msec on my PC corresponds with your observed 280ms, and assuming that the 40x improvement will also apply on your machine, the improved version should run in about 9-10 msec. So this *should* have give you a 4x speedup over the Python version, in theory. I'd love to see how it actually measures on your machine, if you don't mind. ;-) T -- Holding a grudge is like drinking poison and hoping the other person dies. -- seen on the 'Net
Apr 01
parent Nestor <nestor barriolinux.es> writes:
On Thursday, 1 April 2021 at 19:38:39 UTC, H. S. Teoh wrote:
 On Thu, Apr 01, 2021 at 04:52:17PM +0000, Nestor via 
 Digitalmars-d-learn wrote: [...]
     [...]
Since the length of the array is already known beforehand, you could get significant speedups by preallocating the array: [...]
First, thanks everyone! I don't have ldc2 installed so I skipped those suggestions. I always suspected I was doing something wrong with the random generator. Some how in a test if I put the seed outside the loop I got the same (not)random number but might be a copy-paste error from my side. Reserved length in an integers list is something I am starting to give great value, thanks for pointing it out. I feel satisfied with 30-12ms using rdmd I am getting now :) Thanks again Ali, Teoh, Steven, ag0aep6g, Imperatorn
Apr 02