www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - MIR vs. Numpy

reply Tobias Schmidt <tobias.ts.schmidt fau.de> writes:
Dear all,

to compare MIR and Numpy in the HPC context, we implemented a 
multigrid solver in Python using Numpy and in D using Mir and 
perforemd some benchmarks with them.

You can find our code and results here:
https://github.com/typohnebild/numpy-vs-mir

Feedback is very welcome. Please feel free to open issues, pull 
requests or simply post your thoughts below.

Kind regards,
Tobias
Nov 18
next sibling parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:
 Dear all,

 to compare MIR and Numpy in the HPC context, we implemented a 
 multigrid solver in Python using Numpy and in D using Mir and 
 perforemd some benchmarks with them.

 You can find our code and results here:
 https://github.com/typohnebild/numpy-vs-mir
Nice numbers. I’m not a Python guy but I was under the impression that Numpy actually is written in C, so that when you benchmark Numpy you’re mostly benchmarking C, not Python. Therefore I had expected the Numpy performance to be much closer to D’s. An important factor I think, which I’m not sure you have discussed (didn’t look too closely), is the compiler backend that was used to compile D and Numpy. Then again, as a user one is mostly interested in the out-of-the-box performance, which this seems to be a good measure of. — Bastiaan.
Nov 18
parent John Colvin <john.loughran.colvin gmail.com> writes:
On Wednesday, 18 November 2020 at 13:01:42 UTC, Bastiaan Veelo 
wrote:
 On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
 wrote:
 Dear all,

 to compare MIR and Numpy in the HPC context, we implemented a 
 multigrid solver in Python using Numpy and in D using Mir and 
 perforemd some benchmarks with them.

 You can find our code and results here:
 https://github.com/typohnebild/numpy-vs-mir
Nice numbers. I’m not a Python guy but I was under the impression that Numpy actually is written in C, so that when you benchmark Numpy you’re mostly benchmarking C, not Python. Therefore I had expected the Numpy performance to be much closer to D’s. An important factor I think, which I’m not sure you have discussed (didn’t look too closely), is the compiler backend that was used to compile D and Numpy. Then again, as a user one is mostly interested in the out-of-the-box performance, which this seems to be a good measure of. — Bastiaan.
A lot of numpy is in C, C++, fortran, asm etc.... But when you chain a bunch of things together, you are going via python. The language boundary (and python being slow) means that internal iteration in native code is a requirement for performance, which leads to eager allocation for composability via python, which then hurts performance. Numpy makes a very good effort, but is always constrained by this. Clever schemes with laziness where operations in python are actually just composing operations for execution later/on-demand can work as an alternative, but a) that's hard and b) even if you can completely avoid calling back in to python during iteration you would still need JIT to really unlock the performance. Julia fixes this by having all/most in one language which is JIT'd D can do the same with templates AOT, like C++/Eigen does but more flexible and less terrifying code. That's (one part of) what mir provides.
Nov 18
prev sibling next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:
 Dear all,

 to compare MIR and Numpy in the HPC context, we implemented a 
 multigrid solver in Python using Numpy and in D using Mir and 
 perforemd some benchmarks with them.

 You can find our code and results here:
 https://github.com/typohnebild/numpy-vs-mir

 Feedback is very welcome. Please feel free to open issues, pull 
 requests or simply post your thoughts below.

 Kind regards,
 Tobias
Very nice write up. It's been a while since I've used numba, so I was a little confused on the numba 1 and numba 8 runs. It also looks like you are compiling on ldc with -mcpu=native --boundscheck=off. Why not -O as well?
Nov 18
next sibling parent reply 9il <ilyayaroshenko gmail.com> writes:
On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
 On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
 wrote:

 It also looks like you are compiling on ldc with -mcpu=native 
 --boundscheck=off. Why not -O as well?
-O is added by DUB
Nov 18
next sibling parent Max Haughton <maxhaton gmail.com> writes:
On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:
 On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
 On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
 wrote:

 It also looks like you are compiling on ldc with -mcpu=native 
 --boundscheck=off. Why not -O as well?
-O is added by DUB
Just -O? LDC is quite impressive with lto and cross-module-inlining turned on
Nov 18
prev sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:
 [snip]

 -O is added by DUB
Ah, the -release-nobounds
Nov 18
prev sibling parent Tobias Schmidt <tobias.ts.schmidt fau.de> writes:
Thanks for all of your feedback!

On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
 It's been a while since I've used numba, so I was a little 
 confused on the numba 1 and numba 8 runs.
The number was meant as the number of used threads in our runs. The prefix 'numba' is indicating if numba was used (numba) or not (nonumba). We have added a section to clarify this. Thanks for the hint.
Nov 20
prev sibling parent 9il <ilyayaroshenko gmail.com> writes:
On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:
 Dear all,

 to compare MIR and Numpy in the HPC context, we implemented a 
 multigrid solver in Python using Numpy and in D using Mir and 
 perforemd some benchmarks with them.

 You can find our code and results here:
 https://github.com/typohnebild/numpy-vs-mir

 Feedback is very welcome. Please feel free to open issues, pull 
 requests or simply post your thoughts below.

 Kind regards,
 Tobias
Thank you a lot! It is a huge benefit for Mir and D to have so quality benchmarks. Python's sweep_3D access memory only once for one element computation, while old D's sweep_slice access it 7 times. A PR [1] for new version of sweep_slice was added, I expect it will be at least twice faster. The new sweep_slice uses a more D'sh approach and single memory access to the computation element. [1] https://github.com/typohnebild/numpy-vs-mir/pull/1 Cheers, Ilya
Nov 18