digitalmars.D.announce - MIR vs. Numpy

Tobias Schmidt (10/10) Nov 18 2020 Dear all,

Bastiaan Veelo (12/18) Nov 18 2020 Nice numbers. I’m not a Python guy but I was under the impression

John Colvin (18/38) Nov 18 2020 A lot of numpy is in C, C++, fortran, asm etc....

jmh530 (7/17) Nov 18 2020 Very nice write up.

9il (2/6) Nov 18 2020 -O is added by DUB

Max Haughton (3/10) Nov 18 2020 Just -O? LDC is quite impressive with lto and
jmh530 (2/4) Nov 18 2020 Ah, the -release-nobounds

Tobias Schmidt (6/8) Nov 20 2020 The number was meant as the number of used threads in our runs.

9il (12/22) Nov 18 2020 Thank you a lot! It is a huge benefit for Mir and D to have so

Tobias Schmidt <tobias.ts.schmidt fau.de> writes:

Dear all,

to compare MIR and Numpy in the HPC context, we implemented a 
multigrid solver in Python using Numpy and in D using Mir and 
perforemd some benchmarks with them.

You can find our code and results here:
https://github.com/typohnebild/numpy-vs-mir

Feedback is very welcome. Please feel free to open issues, pull 
requests or simply post your thoughts below.

Kind regards,
Tobias

Nov 18 2020

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:
 Dear all,

 to compare MIR and Numpy in the HPC context, we implemented a 
 multigrid solver in Python using Numpy and in D using Mir and 
 perforemd some benchmarks with them.

 You can find our code and results here:
 https://github.com/typohnebild/numpy-vs-mir

Nice numbers. I’m not a Python guy but I was under the impression 
that Numpy actually is written in C, so that when you benchmark 
Numpy you’re mostly benchmarking C, not Python. Therefore I had 
expected the Numpy performance to be much closer to D’s. An 
important factor I think, which I’m not sure you have discussed 
(didn’t look too closely), is the compiler backend that was used 
to compile D and Numpy. Then again, as a user one is mostly 
interested in the out-of-the-box performance, which this seems to 
be a good measure of.

— Bastiaan.

Nov 18 2020

John Colvin <john.loughran.colvin gmail.com> writes:

On Wednesday, 18 November 2020 at 13:01:42 UTC, Bastiaan Veelo 
wrote:
 On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
 wrote:
 Dear all,

 to compare MIR and Numpy in the HPC context, we implemented a 
 multigrid solver in Python using Numpy and in D using Mir and 
 perforemd some benchmarks with them.

 You can find our code and results here:
 https://github.com/typohnebild/numpy-vs-mir

 Nice numbers. I’m not a Python guy but I was under the 
 impression that Numpy actually is written in C, so that when 
 you benchmark Numpy you’re mostly benchmarking C, not Python. 
 Therefore I had expected the Numpy performance to be much 
 closer to D’s. An important factor I think, which I’m not sure 
 you have discussed (didn’t look too closely), is the compiler 
 backend that was used to compile D and Numpy. Then again, as a 
 user one is mostly interested in the out-of-the-box 
 performance, which this seems to be a good measure of.

 — Bastiaan.

A lot of numpy is in C, C++, fortran, asm etc....

But when you chain a bunch of things together, you are going via 
python. The language boundary (and python being slow) means that 
internal iteration in native code is a requirement for 
performance, which leads to eager allocation for composability 
via python, which then hurts performance. Numpy makes a very good 
effort, but is always constrained by this. Clever schemes with 
laziness where operations in python are actually just composing 
operations for execution later/on-demand can work as an 
alternative, but a) that's hard and b) even if you can completely 
avoid calling back in to python during iteration you would still 
need JIT to really unlock the performance.

Julia fixes this by having all/most in one language which is JIT'd

D can do the same with templates AOT, like C++/Eigen does but 
more flexible and less terrifying code. That's (one part of) what 
mir provides.

Nov 18 2020

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:
 Dear all,

 to compare MIR and Numpy in the HPC context, we implemented a 
 multigrid solver in Python using Numpy and in D using Mir and 
 perforemd some benchmarks with them.

 You can find our code and results here:
 https://github.com/typohnebild/numpy-vs-mir

 Feedback is very welcome. Please feel free to open issues, pull 
 requests or simply post your thoughts below.

 Kind regards,
 Tobias

Very nice write up.

It's been a while since I've used numba, so I was a little 
confused on the numba 1 and numba 8 runs.

It also looks like you are compiling on ldc with -mcpu=native 
--boundscheck=off. Why not -O as well?

Nov 18 2020

9il <ilyayaroshenko gmail.com> writes:

On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
 On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
 wrote:

 It also looks like you are compiling on ldc with -mcpu=native 
 --boundscheck=off. Why not -O as well?

-O is added by DUB

Nov 18 2020

Max Haughton <maxhaton gmail.com> writes:

On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:
 On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
 On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
 wrote:

 It also looks like you are compiling on ldc with -mcpu=native 
 --boundscheck=off. Why not -O as well?

 -O is added by DUB

Just -O? LDC is quite impressive with lto and 
cross-module-inlining turned on

Nov 18 2020

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 18 November 2020 at 15:20:19 UTC, 9il wrote:
 [snip]

 -O is added by DUB

Ah, the -release-nobounds

Nov 18 2020

Tobias Schmidt <tobias.ts.schmidt fau.de> writes:

Thanks for all of your feedback!

On Wednesday, 18 November 2020 at 13:14:37 UTC, jmh530 wrote:
 It's been a while since I've used numba, so I was a little 
 confused on the numba 1 and numba 8 runs.

The number was meant as the number of used threads in our runs. 
The prefix 'numba' is indicating if numba was used (numba) or not 
(nonumba).
We have added a section to clarify this. Thanks for the hint.

Nov 20 2020

9il <ilyayaroshenko gmail.com> writes:

On Wednesday, 18 November 2020 at 10:05:06 UTC, Tobias Schmidt 
wrote:
 Dear all,

 to compare MIR and Numpy in the HPC context, we implemented a 
 multigrid solver in Python using Numpy and in D using Mir and 
 perforemd some benchmarks with them.

 You can find our code and results here:
 https://github.com/typohnebild/numpy-vs-mir

 Feedback is very welcome. Please feel free to open issues, pull 
 requests or simply post your thoughts below.

 Kind regards,
 Tobias

Thank you a lot! It is a huge benefit for Mir and D to have so 
quality benchmarks.

Python's sweep_3D access memory only once for one element 
computation, while old D's sweep_slice access it 7 times.

A PR [1] for new version of sweep_slice was added, I expect it 
will be at least twice faster. The new sweep_slice uses a more 
D'sh approach and single memory access to the computation element.

[1] https://github.com/typohnebild/numpy-vs-mir/pull/1

Cheers,
Ilya

Nov 18 2020

D Programming

C/C++ Programming

Other

digitalmars.D.announce - MIR vs. Numpy