www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Mir GLAS vs Intel MKL: which is faster?

reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
Yesterday I announced [1] blog post [2]  about Mir [3] benchmark. 
Intel MKL and Apple Accelerate was added to the benchmark today.

Please help to improve the blog post during this weekend. It will 
be announced in the Reddit.

[1] 
http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
[2] 
http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html
[3] http://mir.dlang.io
Sep 24 2016
next sibling parent reply Martin Nowak <code dawg.eu> writes:
On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] 
 benchmark. Intel MKL and Apple Accelerate was added to the 
 benchmark today.

 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.
Let me run that on a desktop machine before you publish the results, I have a Core i7-6700 w/ 2133 MHz DDR4 RAM here. Mobile CPU often don't reproduce the same numbers, e.g. https://github.com/dlang/druntime/pull/1603#issuecomment-231543115.
Sep 24 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 24 September 2016 at 08:13:22 UTC, Martin Nowak 
wrote:
 On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
 wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] 
 benchmark. Intel MKL and Apple Accelerate was added to the 
 benchmark today.

 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.
Let me run that on a desktop machine before you publish the results, I have a Core i7-6700 w/ 2133 MHz DDR4 RAM here. Mobile CPU often don't reproduce the same numbers, e.g. https://github.com/dlang/druntime/pull/1603#issuecomment-231543115.
This will be good addition! Thank you! Please use `dub build ...` and then run report at least 2 times, and choice a better one. GEMM uses CPU cache intensively and OS and other apps may significantly hard the performance. So, it make sense to rerun test if something looks failed. Benchmark code: https://github.com/libmir/mir/blob/master/benchmarks/glas/gemm_report.d You can ping at Gitter or open an issue if you need help with Benchmark setup. Gitter: https://gitter.im/libmir/public
Sep 24 2016
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Saturday, 24 September 2016 at 09:14:38 UTC, Ilya Yaroshenko 
wrote:
 Please use `dub build ...` and then run report at least 2 
 times, and choice a better one.
Is this what you mean by your description of the results as e.g. "single precision numbers x2", "double precision numbers x2", etc.? Might be better, instead of the "x2", to offer a small one-sentence description, e.g. "Each benchmark was run twice for each matrix size, and the better of the two runs was chosen in each case."
Sep 26 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Monday, 26 September 2016 at 09:46:50 UTC, Joseph Rushton 
Wakeling wrote:
 On Saturday, 24 September 2016 at 09:14:38 UTC, Ilya Yaroshenko 
 wrote:
 Please use `dub build ...` and then run report at least 2 
 times, and choice a better one.
Is this what you mean by your description of the results as e.g. "single precision numbers x2", "double precision numbers x2", etc.? Might be better, instead of the "x2", to offer a small one-sentence description, e.g. "Each benchmark was run twice for each matrix size, and the better of the two runs was chosen in each case."
I mean that for single precision numbers I have 2 charts (normal and normalized).
Sep 26 2016
parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 26 September 2016 at 10:01:44 UTC, Ilya Yaroshenko 
wrote:
 I mean that for single precision numbers I have 2 charts 
 (normal and normalized).
Ah, OK. Would still be nice to have a note, though, on how the numbers in the charts are generated, i.e. are they the result of a single run, best of N, average of N ... ?
Sep 26 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Monday, 26 September 2016 at 11:03:40 UTC, Joseph Rushton 
Wakeling wrote:
 On Monday, 26 September 2016 at 10:01:44 UTC, Ilya Yaroshenko 
 wrote:
 I mean that for single precision numbers I have 2 charts 
 (normal and normalized).
Ah, OK. Would still be nice to have a note, though, on how the numbers in the charts are generated, i.e. are they the result of a single run, best of N, average of N ... ?
The data is the same. The first chart represents absolute values, the second chart represents normalised values. Will add
Sep 26 2016
prev sibling next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 24/09/2016 7:20 PM, Ilya Yaroshenko wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] benchmark. Intel
 MKL and Apple Accelerate was added to the benchmark today.

 Please help to improve the blog post during this weekend. It will be
 announced in the Reddit.

 [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2]
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

 [3] http://mir.dlang.io
For giggles can we get a comparison against dmc for Intel MKL assuming of course it compiles?
Sep 24 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 24 September 2016 at 12:08:33 UTC, rikki cattermole 
wrote:
 On 24/09/2016 7:20 PM, Ilya Yaroshenko wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] 
 benchmark. Intel
 MKL and Apple Accelerate was added to the benchmark today.

 Please help to improve the blog post during this weekend. It 
 will be
 announced in the Reddit.

 [1] 
 http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2]
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

 [3] http://mir.dlang.io
For giggles can we get a comparison against dmc for Intel MKL assuming of course it compiles?
Intel MKL is closed source. In the same time I don't think that a compiler makes sense for OpenBLAS, Intel MKL, and Apple Accelerate because their computation kernel source code written in assembler.
Sep 24 2016
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/24/16 3:20 AM, Ilya Yaroshenko wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] benchmark. Intel
 MKL and Apple Accelerate was added to the benchmark today.

 Please help to improve the blog post during this weekend. It will be
 announced in the Reddit.

 [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2]
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

 [3] http://mir.dlang.io
Awesome. Good to see that most of the graphs have a nice blue envelope :o). Could you also add a comparison with SciPy? People often say it's just fine for scientific computing. A few correx: "The post represents performance benchmark" -> "This post presents performance benchmarks" "most of numerical" -> "most numerical" "for example Julia Programing Language" -> "for example the Julia Programing Language" "Mir GLAS is Generic Linear Algebra Subroutines. It has single generic kernel for all targets, all floating point and complex types." -> "Mir GLAS (Generic Linear Algebra Subroutines) has a single generic kernel for all CPU targets, all floating point types, and all complex types." "In addition, Mir GLAS Level 3 kernels are not unrolled and produce tiny binary code." -> "In addition, Mir GLAS Level 3 kernels are not unrolled and produce tiny binary code, so they put less pressure on the instruction cache in large applications." "To add new architecture" -> "To add a new architecture" "needs to extend small GLAS configuration file" -> "needs to extend one small GLAS configuration file" "configuration is available for" -> "configurations are available for" "Mir GLAS has native mir.ndslice interface." -> "Mir GLAS offers a native interface in module mir.ndslice." "for almost all cases" -> "for virtually all benchmarks and parameters" "Ilya is IT consultant, statistician. He has experience in distributed High Load services, business process analyses. He is the author of std.experimental.ndslice and Mir founder. He was a GSoC mentor for the D Language Foundation and Mir project." -> "Ilya is an IT consultant with a background in statistics. He has experience in distributed high-load services and business process analyses. He is the creator of the Mir library, including std.experimental.ndslice in the D Standard Library. He mentored a related GSoC project for the D Language Foundation." Andrei
Sep 24 2016
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
Also the linkedin photo is much better than the one at the bottom of the 
benchmark page. -- Andrei
Sep 24 2016
prev sibling next sibling parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just fine for scientific computing.
That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen. An Eigen comparison would be interesting.
Sep 24 2016
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/24/16 9:18 AM, John Colvin wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say it's just
 fine for scientific computing.
That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen.
I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.
 An Eigen comparison would be interesting.
That'd be awesome especially since the article text refers to it. Andrei
Sep 24 2016
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei 
Alexandrescu wrote:
 I see, thanks. To the extent the Python-specific overheads are 
 measurable, it might make sense to include the benchmark.
Here are some benchmarks from Eigen and Blaze for comparison http://eigen.tuxfamily.org/index.php?title=Benchmark https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks They don't include Python, for the reason mentioned above (no one would use native python implementation of matrix multiplication, it just calls some other library). I don't see a reason to include it here.
Sep 24 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 09/24/2016 10:26 AM, jmh530 wrote:
 On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:
 I see, thanks. To the extent the Python-specific overheads are
 measurable, it might make sense to include the benchmark.
Here are some benchmarks from Eigen and Blaze for comparison http://eigen.tuxfamily.org/index.php?title=Benchmark https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks They don't include Python, for the reason mentioned above (no one would use native python implementation of matrix multiplication, it just calls some other library). I don't see a reason to include it here.
OK. Yah, native Python wouldn't make sense. It may be worth mentioning that SciPy uses BLAS so it has the same performance profile. Also, a great idea for a followup would be a blog post comparing the source code for a typical linear algebra real-world task. The idea being, yes the D version has parity with Intel, but there _is_ a reason to switch to it because of its ease of use. Andrei
Sep 24 2016
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei 
Alexandrescu wrote:
 On 9/24/16 9:18 AM, John Colvin wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
 Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just
 fine for scientific computing.
That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen.
I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.
 An Eigen comparison would be interesting.
That'd be awesome especially since the article text refers to it. Andrei
Eigen was added (but only data, still need to write text). Relative charts was added. You was added "Acknowledgements" section --Ilya
Sep 24 2016
next sibling parent reply ZombineDev <petar.p.kirov gmail.com> writes:
On Saturday, 24 September 2016 at 17:46:07 UTC, Ilya Yaroshenko 
wrote:
 On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei 
 Alexandrescu wrote:
 On 9/24/16 9:18 AM, John Colvin wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
 Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just
 fine for scientific computing.
That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen.
I see, thanks. To the extent the Python-specific overheads are measurable, it might make sense to include the benchmark.
 An Eigen comparison would be interesting.
That'd be awesome especially since the article text refers to it. Andrei
Eigen was added (but only data, still need to write text). Relative charts was added. You was added "Acknowledgements" section --Ilya
It would also be interesting to compare the results to Blaze [1]. According to https://www.youtube.com/watch?v=hfn0BVOegac it is faster than Eigen and on some instances faster than even Intel MKL. [1]: https://bitbucket.org/blaze-lib/blaze
Sep 24 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 24 September 2016 at 18:15:30 UTC, ZombineDev wrote:
 On Saturday, 24 September 2016 at 17:46:07 UTC, Ilya Yaroshenko 
 wrote:
 On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei 
 Alexandrescu wrote:
 [...]
Eigen was added (but only data, still need to write text). Relative charts was added. You was added "Acknowledgements" section --Ilya
It would also be interesting to compare the results to Blaze [1]. According to https://www.youtube.com/watch?v=hfn0BVOegac it is faster than Eigen and on some instances faster than even Intel MKL. [1]: https://bitbucket.org/blaze-lib/blaze
It has not CBLAS interface like Eigen, so additional efforts are required.
Sep 25 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:
 Eigen was added (but only data, still need to write text). Relative
 charts was added.
Looks awesome. Couple more nits after one more pass: "numerical and scientific projects" -> "numeric and scientific projects" "OpenBLAS Haswell computation kernels" -> "The OpenBLAS Haswell computation kernels" "To add a new architecture or target an engineer" -> "To add a new architecture or target, an engineer" "configurations are available for X87, SSE2, AVX, and AVX2 instruction sets" -> "configurations are available for the X87, SSE2, AVX, and AVX2 instruction sets" In the machine, you may want to specify the amount of L2 cache (I think it's 6 MB) Instead of "Recent" MKL, a version number would be more precise Relative performance plots should specify "percent", i.e. "Performance relative to Mir" -> "Performance relative to Mir [%]" "General Matrix-matrix Multiplication" -> "General Matrix-Matrix Multiplication" Andrei
Sep 24 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 24 September 2016 at 19:01:47 UTC, Andrei 
Alexandrescu wrote:
 On 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:
 [...]
Looks awesome. Couple more nits after one more pass: "numerical and scientific projects" -> "numeric and scientific projects" [...]
Thank for the review! I have added notes about Eigen and CBLAS interface example. Ilya
Sep 25 2016
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 09/25/2016 03:45 AM, Ilya Yaroshenko wrote:
 On Saturday, 24 September 2016 at 19:01:47 UTC, Andrei Alexandrescu wrote:
 On 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:
 [...]
Looks awesome. Couple more nits after one more pass: "numerical and scientific projects" -> "numeric and scientific projects" [...]
Thank for the review! I have added notes about Eigen and CBLAS interface example. Ilya
Some more: In the same time, CBLAS interface is unwieldy -> On the other hand, CBLAS interface is unwieldy (Or something better?) GLAS calling conversion -> GLAS calling convention single precisions -> single precision (Several occurrences) double precisions -> double precision (Several occurrences) Stay in touch with the lastest developments in scientific computing for D. -> (I will let others recommend something better there but neither "stay in touch" nor "lastest" sounds right to my ears. :) ) Ali
Sep 25 2016
next sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Sunday, 25 September 2016 at 23:03:27 UTC, Ali Çehreli wrote:
 On 09/25/2016 03:45 AM, Ilya Yaroshenko wrote:
 On Saturday, 24 September 2016 at 19:01:47 UTC, Andrei 
 Alexandrescu wrote:
 [...]
Thank for the review! I have added notes about Eigen and CBLAS interface example. Ilya
Some more: In the same time, CBLAS interface is unwieldy -> On the other hand, CBLAS interface is unwieldy (Or something better?) GLAS calling conversion -> GLAS calling convention single precisions -> single precision (Several occurrences) double precisions -> double precision (Several occurrences) Stay in touch with the lastest developments in scientific computing for D. -> (I will let others recommend something better there but neither "stay in touch" nor "lastest" sounds right to my ears. :) ) Ali
Thank you, fixed
Sep 25 2016
prev sibling parent Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Sunday, 25 September 2016 at 23:03:27 UTC, Ali Çehreli wrote:
 Stay in touch with the lastest developments in scientific 
 computing for D. ->
 (I will let others recommend something better there but neither 
 "stay in touch" nor "lastest" sounds right to my ears. :) )
"lastest" -> "latest" ... ?
Sep 26 2016
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Sunday, 25 September 2016 at 10:45:35 UTC, Ilya Yaroshenko 
wrote:
 Thank for the review! I have added notes about Eigen and CBLAS 
 interface example.
One extra suggestion: "Mir GLAS has native mir.ndslice interface" -> "Mir GLAS has a native mir.ndslice interface" I would also suggest adding a small note on what `ndslice` is, e.g. "mir.ndslice is a development version of std.experimental.ndslice, which provides an N-dimensional equivalent of D's built-in array slicing."
Sep 26 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Monday, 26 September 2016 at 08:57:06 UTC, Joseph Rushton 
Wakeling wrote:
 On Sunday, 25 September 2016 at 10:45:35 UTC, Ilya Yaroshenko 
 wrote:
 Thank for the review! I have added notes about Eigen and CBLAS 
 interface example.
One extra suggestion: "Mir GLAS has native mir.ndslice interface" -> "Mir GLAS has a native mir.ndslice interface" I would also suggest adding a small note on what `ndslice` is, e.g. "mir.ndslice is a development version of std.experimental.ndslice, which provides an N-dimensional equivalent of D's built-in array slicing."
Thank you, added
Sep 26 2016
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 24 September 2016 at 13:18:14 UTC, John Colvin wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
 Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just fine for scientific computing.
That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen. An Eigen comparison would be interesting.
Seems like libeigen_blas.dylib and libeigen_blas_static.a does not contain _cblas_sgemm symbol for example. Does they work for you?
Sep 24 2016
parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 24 September 2016 at 14:59:32 UTC, Ilya Yaroshenko 
wrote:
 On Saturday, 24 September 2016 at 13:18:14 UTC, John Colvin 
 wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
 Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just fine for scientific computing.
That's just BLAS (so could be mkl, could be openBLAS, could be netlib, etc. just depends on the system and compilation choices) under the hood, you'd just see a small overhead from the python wrapping. Basically, everyone uses a BLAS or Eigen. An Eigen comparison would be interesting.
Seems like libeigen_blas.dylib and libeigen_blas_static.a does not contain _cblas_sgemm symbol for example. Does they work for you?
Fixed with Netlib CBLAS
Sep 24 2016
parent dextorious <dextorious gmail.com> writes:
First of all, awesome work. It's great to see that it's possible 
to match or even exceed the performance of hand-crafted assembly 
implementations with generic code.

I would suggest adding more information on how the Eigen results 
were obtained. Unlike OpenBLAS, Eigen performance does often vary 
by compiler and varies greatly depending on the kind of 
preprocessor macros that are defined. In particular, 
EIGEN_NO_DEBUG is defined by default and reduces performance, 
EIGEN_FAST_MATH is not defined by default but can often increase 
performance and EIGEN_STACK_ALLOCATION_LIMIT matters greatly for 
performance on very small matrices (where MKL and especially 
OpenBLAS are very inefficient). It's been a while since I've used 
Eigen, so I may have forgotten one or two.

It may also be worth noting in the blog post that these are all 
single threaded comparisons and multithreaded implementations are 
on the way. This is obvious to anyone who's followed the 
development of Mir, but a general audience on Reddit will likely 
point it out as a deficiency unless stated upfront.
Sep 24 2016
prev sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
Alexandrescu wrote:
 On 9/24/16 3:20 AM, Ilya Yaroshenko wrote:
 [...]
Awesome. Good to see that most of the graphs have a nice blue envelope :o). Could you also add a comparison with SciPy? People often say it's just fine for scientific computing. [...]
Thank you !!! --Ilya
Sep 24 2016
prev sibling next sibling parent reply WebFreak001 <janju007 web.de> writes:
On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] 
 benchmark. Intel MKL and Apple Accelerate was added to the 
 benchmark today.

 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.

 [1] 
 http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2] 
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html
 [3] http://mir.dlang.io
I think you should put the Mir.GLAS graph in front of all the other graphs, right now they are overlapping on that graph. Would probably look a bit better if Mir.GLAS was in the front
Sep 24 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/24/16 8:59 AM, WebFreak001 wrote:
 On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] benchmark.
 Intel MKL and Apple Accelerate was added to the benchmark today.

 Please help to improve the blog post during this weekend. It will be
 announced in the Reddit.

 [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2]
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

 [3] http://mir.dlang.io
I think you should put the Mir.GLAS graph in front of all the other graphs, right now they are overlapping on that graph. Would probably look a bit better if Mir.GLAS was in the front
Also, one other class of plots that would be informative: performance of all other libraries normalized to Mir. The Y axis would be in percentages with Mir at 100%. Then people can easily see what relative gains to expect. -- Andrei
Sep 24 2016
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
wrote:
 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.
One other place that a little more explanation could be helpful is this sentence: "It is written completely in D for LDC (LLVM D Compiler), without any assembler blocks." It would be nice to describe (if it can be summarized in a sentence) why Mir GLAS relies on LDC and/or LLVM, and what differences in outcome can be expected if one uses a different compiler (will it not work at all, or just not as well?). The broader topic of what compiler features Mir GLAS uses could be the topic of an entire blog post in its own right, and might be very interesting.
Sep 26 2016
next sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton 
Wakeling wrote:
 On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
 wrote:
 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.
One other place that a little more explanation could be helpful is this sentence: "It is written completely in D for LDC (LLVM D Compiler), without any assembler blocks." It would be nice to describe (if it can be summarized in a sentence) why Mir GLAS relies on LDC and/or LLVM, and what differences in outcome can be expected if one uses a different compiler (will it not work at all, or just not as well?). The broader topic of what compiler features Mir GLAS uses could be the topic of an entire blog post in its own right, and might be very interesting.
Updated: Mir is LLVM-Accelerated Generic Numerical Library for Science and Machine Learning. It requires LDC (LLVM D Compiler) for compilation. Mir GLAS (Generic Linear Algebra Subprograms) has a single generic kernel for all CPU targets, all floating point types, and all complex types. It is written completely in D, without any assembler blocks. In addition, Mir GLAS Level 3 kernels are not unrolled and produce tiny binary code, so they put less pressure on the instruction cache in large applications.
Sep 26 2016
next sibling parent reply Edwin van Leeuwen <edder tkwsping.nl> writes:
On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko 
wrote:
 Updated:
 Mir is LLVM-Accelerated Generic Numerical Library for Science 
 and Machine Learning. It requires LDC (LLVM D Compiler) for 
 compilation.
It doesn't really require LDC though, it just requires it to get good performance? I can still use DMD for quick testing?
Sep 26 2016
next sibling parent Edwin van Leeuwen <edder tkwsping.nl> writes:
On Monday, 26 September 2016 at 11:36:11 UTC, Edwin van Leeuwen 
wrote:
 On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko 
 wrote:
 Updated:
 Mir is LLVM-Accelerated Generic Numerical Library for Science 
 and Machine Learning. It requires LDC (LLVM D Compiler) for 
 compilation.
It doesn't really require LDC though, it just requires it to get good performance? I can still use DMD for quick testing?
I would say something like: For optimal performance it should be compiled using LDC.
Sep 26 2016
prev sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Monday, 26 September 2016 at 11:36:11 UTC, Edwin van Leeuwen 
wrote:
 On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko 
 wrote:
 Updated:
 Mir is LLVM-Accelerated Generic Numerical Library for Science 
 and Machine Learning. It requires LDC (LLVM D Compiler) for 
 compilation.
It doesn't really require LDC though, it just requires it to get good performance? I can still use DMD for quick testing?
No, LDC is required. I plan to update DUB for quick testing without binary compilation for DUB. The reason why DMD support was dropped is that it generates 10-20 times slower code for matrix multiplication. My opinion is that D community is too small to maintain 3 compilers and we should move forward with LDC. Ilya
Sep 26 2016
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko 
wrote:
 Updated:
 Mir is LLVM-Accelerated Generic Numerical Library for Science 
 and Machine Learning. It requires LDC (LLVM D Compiler) for 
 compilation. Mir GLAS (Generic Linear Algebra Subprograms) has 
 a single generic kernel for all CPU targets, all floating point 
 types, and all complex types. It is written completely in D, 
 without any assembler blocks. In addition, Mir GLAS Level 3 
 kernels are not unrolled and produce tiny binary code, so they 
 put less pressure on the instruction cache in large 
 applications.
Hmmm, I was thinking more along the lines of just describing (very briefly) what features of LLVM Mir GLAS relies on. But I think this might run the risk of endless re-revision. One minor tweak: "Mir is LLVM-Accelerated Generic Numerical Library" -> "Mir is an LLVM-Accelerated Generic Numerical Library"
Sep 26 2016
parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Monday, 26 September 2016 at 12:20:25 UTC, Joseph Rushton 
Wakeling wrote:
 "Mir is LLVM-Accelerated Generic Numerical Library" -> "Mir is 
 an LLVM-Accelerated Generic Numerical Library"
Thanks, fixed
Sep 26 2016
prev sibling parent reply Johan Engelen <j j.nl> writes:
On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton 
Wakeling wrote:
 The broader topic of what compiler features Mir GLAS uses could 
 be the topic of an entire blog post in its own right, and might 
 be very interesting.
I guess this is my terrain. I'll think about writing that blog post :) Specific LDC features that I see in GLAS are: - __traits(targetHasFeature, ...) , see https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature - fastmath, see https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29 - Modules ldc.simd and ldc.intrinsics. - Extended allowed sizes for __vector (still very limited) To get an idea of what is different for LDC and DMD, this PR removed support for DMD: https://github.com/libmir/mir/pull/347 -Johan
Sep 26 2016
parent reply Edwin van Leeuwen <edder tkwsping.nl> writes:
On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen wrote:
 On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton 
 Wakeling wrote:
 The broader topic of what compiler features Mir GLAS uses 
 could be the topic of an entire blog post in its own right, 
 and might be very interesting.
I guess this is my terrain. I'll think about writing that blog post :) Specific LDC features that I see in GLAS are: - __traits(targetHasFeature, ...) , see https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature - fastmath, see https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29 - Modules ldc.simd and ldc.intrinsics. - Extended allowed sizes for __vector (still very limited) To get an idea of what is different for LDC and DMD, this PR removed support for DMD: https://github.com/libmir/mir/pull/347 -Johan
Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library.
Sep 26 2016
next sibling parent reply Johan Engelen <j j.nl> writes:
On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen 
wrote:
 
 Ah, I was not aware that DMD support was dropped completely. I 
 think that is a real shame, and it makes it _much_ less likely 
 that I will use mir in my own projects, let alone as a 
 dependency in another library.
"_much_" :'( :'( Please don't write that to LDC devs.
Sep 26 2016
parent Edwin van Leeuwen <edder tkwsping.nl> writes:
On Monday, 26 September 2016 at 11:59:57 UTC, Johan Engelen wrote:
 On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen 
 wrote:
 
 Ah, I was not aware that DMD support was dropped completely. I 
 think that is a real shame, and it makes it _much_ less likely 
 that I will use mir in my own projects, let alone as a 
 dependency in another library.
"_much_" :'( :'( Please don't write that to LDC devs.
I love LDC, I just also tend to use DMD for testing and won't force people to use ldc over dmd if they want to use a library I build.
Sep 26 2016
prev sibling parent reply Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen 
wrote:
 On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen 
 wrote:
 On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton 
 Wakeling wrote:
 The broader topic of what compiler features Mir GLAS uses 
 could be the topic of an entire blog post in its own right, 
 and might be very interesting.
I guess this is my terrain. I'll think about writing that blog post :) Specific LDC features that I see in GLAS are: - __traits(targetHasFeature, ...) , see https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature - fastmath, see https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29 - Modules ldc.simd and ldc.intrinsics. - Extended allowed sizes for __vector (still very limited) To get an idea of what is different for LDC and DMD, this PR removed support for DMD: https://github.com/libmir/mir/pull/347 -Johan
Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library.
Shame is that D is not popular. I think that Mir can replace C / C++ for hight performance application. And became the best industry system language. My goal is not a package for D community. My goal is a library for industry. A library that can involve new comers and extend D community multiple times. Ilya
Sep 26 2016
next sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Monday, 26 September 2016 at 12:11:16 UTC, Ilya Yaroshenko 
wrote:
 On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen 
 wrote:
 On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen 
 wrote:
 [...]
Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library.
Shame is that D is not popular. I think that Mir can replace C / C++ for hight performance application. And became the best industry system language. My goal is not a package for D community. My goal is a library for industry. A library that can involve new comers and extend D community multiple times. Ilya
EDIT: that Mir can help D to replace ...
Sep 26 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 9/26/16 2:11 PM, Ilya Yaroshenko wrote:
 On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen wrote:
 On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen wrote:
 On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton Wakeling
 wrote:
 The broader topic of what compiler features Mir GLAS uses could be
 the topic of an entire blog post in its own right, and might be very
 interesting.
I guess this is my terrain. I'll think about writing that blog post :) Specific LDC features that I see in GLAS are: - __traits(targetHasFeature, ...) , see https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature - fastmath, see https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29 - Modules ldc.simd and ldc.intrinsics. - Extended allowed sizes for __vector (still very limited) To get an idea of what is different for LDC and DMD, this PR removed support for DMD: https://github.com/libmir/mir/pull/347 -Johan
Ah, I was not aware that DMD support was dropped completely. I think that is a real shame, and it makes it _much_ less likely that I will use mir in my own projects, let alone as a dependency in another library.
Shame is that D is not popular. I think that Mir can replace C / C++ for hight performance application. And became the best industry system language. My goal is not a package for D community. My goal is a library for industry. A library that can involve new comers and extend D community multiple times.
I think we need to make it a point to support Mir in dmd. -- Andrei
Sep 26 2016
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Monday, 26 September 2016 at 16:55:02 UTC, Andrei Alexandrescu 
wrote:
 I think we need to make it a point to support Mir in dmd. -- 
 Andrei
+1, even if it's slow.
Sep 26 2016
parent Johan Engelen <j j.nl> writes:
On Monday, 26 September 2016 at 18:27:15 UTC, jmh530 wrote:
 On Monday, 26 September 2016 at 16:55:02 UTC, Andrei 
 Alexandrescu wrote:
 I think we need to make it a point to support Mir in dmd. -- 
 Andrei
+1, even if it's slow.
I thought so too but if the algorithm is 50x slower, it probably means you can't develop that algorithm any more (I wouldn't). I think the common use-case for Mir is a calculation that takes seconds, so 50x turns a test into a run of several minutes... (defeating the compilation speed advantage of DMD) It is easy to want something, but someone else has to do it and live with it too. It's up to the Mir devs (**volunteers!**) to choose which compilers they support. As you can see from the PR that removed DMD support, the extra burden is substantial.
Sep 26 2016
prev sibling parent Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:
On Monday, 26 September 2016 at 16:55:02 UTC, Andrei Alexandrescu 
wrote:
 I think we need to make it a point to support Mir in dmd. -- 
 Andrei
new thread https://forum.dlang.org/thread/pqgtvxklmedxuztopwiq forum.dlang.org
Sep 26 2016