digitalmars.D - Mir GLAS vs Intel MKL: which is faster?

Ilya Yaroshenko (9/9) Sep 24 2016 Yesterday I announced [1] blog post [2] about Mir [3] benchmark.

Martin Nowak (6/11) Sep 24 2016 Let me run that on a desktop machine before you publish the

Ilya Yaroshenko (13/25) Sep 24 2016 This will be good addition! Thank you!

Joseph Rushton Wakeling (9/11) Sep 26 2016 Is this what you mean by your description of the results as e.g.

Ilya Yaroshenko (4/15) Sep 26 2016 I mean that for single precision numbers I have 2 charts (normal

Joseph Rushton Wakeling (5/7) Sep 26 2016 Ah, OK. Would still be nice to have a note, though, on how the

Ilya Yaroshenko (4/11) Sep 26 2016 The data is the same. The first chart represents absolute values,

rikki cattermole (3/11) Sep 24 2016 For giggles can we get a comparison against dmc for Intel MKL assuming

Ilya Yaroshenko (6/23) Sep 24 2016 Intel MKL is closed source. In the same time I don't think that a

Andrei Alexandrescu (34/42) Sep 24 2016 Awesome. Good to see that most of the graphs have a nice blue envelope

Andrei Alexandrescu (2/2) Sep 24 2016 Also the linkedin photo is much better than the one at the bottom of the...
John Colvin (7/9) Sep 24 2016 That's just BLAS (so could be mkl, could be openBLAS, could be

Andrei Alexandrescu (5/13) Sep 24 2016 I see, thanks. To the extent the Python-specific overheads are

jmh530 (9/11) Sep 24 2016 Here are some benchmarks from Eigen and Blaze for comparison

Andrei Alexandrescu (8/20) Sep 24 2016 OK. Yah, native Python wouldn't make sense. It may be worth mentioning

Ilya Yaroshenko (5/25) Sep 24 2016 Eigen was added (but only data, still need to write text).

ZombineDev (7/37) Sep 24 2016 It would also be interesting to compare the results to Blaze [1].

Ilya Yaroshenko (3/17) Sep 25 2016 It has not CBLAS interface like Eigen, so additional efforts are

Andrei Alexandrescu (18/20) Sep 24 2016 Looks awesome. Couple more nits after one more pass:

Ilya Yaroshenko (5/11) Sep 25 2016 Thank for the review! I have added notes about Eigen and CBLAS

=?UTF-8?Q?Ali_=c3=87ehreli?= (18/30) Sep 25 2016 Some more:

Ilya Yaroshenko (2/29) Sep 25 2016 Thank you, fixed
Joseph Rushton Wakeling (2/6) Sep 26 2016 "lastest" -> "latest" ... ?

Joseph Rushton Wakeling (10/12) Sep 26 2016 One extra suggestion:

Ilya Yaroshenko (3/15) Sep 26 2016 Thank you, added

Ilya Yaroshenko (4/13) Sep 24 2016 Seems like libeigen_blas.dylib and libeigen_blas_static.a does

Ilya Yaroshenko (3/19) Sep 24 2016 Fixed with Netlib CBLAS

dextorious (18/18) Sep 24 2016 First of all, awesome work. It's great to see that it's possible

Ilya Yaroshenko (3/9) Sep 24 2016 Thank you !!! --Ilya

WebFreak001 (5/15) Sep 24 2016 I think you should put the Mir.GLAS graph in front of all the

Andrei Alexandrescu (5/20) Sep 24 2016 Also, one other class of plots that would be informative: performance of...

Joseph Rushton Wakeling (13/15) Sep 26 2016 One other place that a little more explanation could be helpful

Ilya Yaroshenko (11/26) Sep 26 2016 Updated:

Edwin van Leeuwen (4/8) Sep 26 2016 It doesn't really require LDC though, it just requires it to get

Edwin van Leeuwen (4/12) Sep 26 2016 I would say something like:
Ilya Yaroshenko (9/17) Sep 26 2016 No, LDC is required. I plan to update DUB for quick testing

Joseph Rushton Wakeling (8/18) Sep 26 2016 Hmmm, I was thinking more along the lines of just describing

Ilya Yaroshenko (3/5) Sep 26 2016 Thanks, fixed

Johan Engelen (14/17) Sep 26 2016 I guess this is my terrain. I'll think about writing that blog

Edwin van Leeuwen (5/23) Sep 26 2016 Ah, I was not aware that DMD support was dropped completely. I

Johan Engelen (4/9) Sep 26 2016 "_much_"

Edwin van Leeuwen (4/13) Sep 26 2016 I love LDC, I just also tend to use DMD for testing and won't

Ilya Yaroshenko (9/41) Sep 26 2016 Shame is that D is not popular. I think that Mir can replace C /

Ilya Yaroshenko (3/20) Sep 26 2016 EDIT: that Mir can help D to replace ...
Andrei Alexandrescu (2/40) Sep 26 2016 I think we need to make it a point to support Mir in dmd. -- Andrei

jmh530 (3/5) Sep 26 2016 +1, even if it's slow.

Johan Engelen (10/16) Sep 26 2016 I thought so too but if the algorithm is 50x slower, it probably

Ilya Yaroshenko (4/6) Sep 26 2016 new thread

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

Yesterday I announced [1] blog post [2]  about Mir [3] benchmark. 
Intel MKL and Apple Accelerate was added to the benchmark today.

Please help to improve the blog post during this weekend. It will 
be announced in the Reddit.

[1] 
http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
[2] 
http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html
[3] http://mir.dlang.io

Sep 24 2016

Martin Nowak <code dawg.eu> writes:

On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] 
 benchmark. Intel MKL and Apple Accelerate was added to the 
 benchmark today.

 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.

Let me run that on a desktop machine before you publish the 
results, I have a Core i7-6700 w/ 2133 MHz DDR4 RAM here.
Mobile CPU often don't reproduce the same numbers, e.g. 
https://github.com/dlang/druntime/pull/1603#issuecomment-231543115.

Sep 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 24 September 2016 at 08:13:22 UTC, Martin Nowak 
wrote:
 On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
 wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] 
 benchmark. Intel MKL and Apple Accelerate was added to the 
 benchmark today.

 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.

 Let me run that on a desktop machine before you publish the 
 results, I have a Core i7-6700 w/ 2133 MHz DDR4 RAM here.
 Mobile CPU often don't reproduce the same numbers, e.g. 
 https://github.com/dlang/druntime/pull/1603#issuecomment-231543115.

This will be good addition! Thank you!
Please use `dub build ...` and then run report at least 2 times, 
and choice a better one. GEMM uses CPU cache intensively and OS 
and other apps may significantly hard the performance. So, it 
make sense to rerun test if something looks failed.

Benchmark code: 
https://github.com/libmir/mir/blob/master/benchmarks/glas/gemm_report.d

You can ping at Gitter or open an issue if you need help with 
Benchmark setup.

Gitter:
https://gitter.im/libmir/public

Sep 24 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 24 September 2016 at 09:14:38 UTC, Ilya Yaroshenko 
wrote:
 Please use `dub build ...` and then run report at least 2 
 times, and choice a better one.

Is this what you mean by your description of the results as e.g. 
"single precision numbers x2", "double precision numbers x2", 
etc.?

Might be better, instead of the "x2", to offer a small 
one-sentence description, e.g. "Each benchmark was run twice for 
each matrix size, and the better of the two runs was chosen in 
each case."

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Monday, 26 September 2016 at 09:46:50 UTC, Joseph Rushton 
Wakeling wrote:
 On Saturday, 24 September 2016 at 09:14:38 UTC, Ilya Yaroshenko 
 wrote:
 Please use `dub build ...` and then run report at least 2 
 times, and choice a better one.

 Is this what you mean by your description of the results as 
 e.g. "single precision numbers x2", "double precision numbers 
 x2", etc.?

 Might be better, instead of the "x2", to offer a small 
 one-sentence description, e.g. "Each benchmark was run twice 
 for each matrix size, and the better of the two runs was chosen 
 in each case."

I mean that for single precision numbers I have 2 charts (normal 
and normalized).

Sep 26 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Monday, 26 September 2016 at 10:01:44 UTC, Ilya Yaroshenko 
wrote:
 I mean that for single precision numbers I have 2 charts 
 (normal and normalized).

Ah, OK.  Would still be nice to have a note, though, on how the 
numbers in the charts are generated, i.e. are they the result of 
a single run, best of N, average of N ... ?

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Monday, 26 September 2016 at 11:03:40 UTC, Joseph Rushton 
Wakeling wrote:
 On Monday, 26 September 2016 at 10:01:44 UTC, Ilya Yaroshenko 
 wrote:
 I mean that for single precision numbers I have 2 charts 
 (normal and normalized).

 Ah, OK.  Would still be nice to have a note, though, on how the 
 numbers in the charts are generated, i.e. are they the result 
 of a single run, best of N, average of N ... ?

The data is the same. The first chart represents absolute values, 
the second chart represents normalised values. Will add

Sep 26 2016

rikki cattermole <rikki cattermole.co.nz> writes:

On 24/09/2016 7:20 PM, Ilya Yaroshenko wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] benchmark. Intel
 MKL and Apple Accelerate was added to the benchmark today.

 Please help to improve the blog post during this weekend. It will be
 announced in the Reddit.

 [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2]
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

 [3] http://mir.dlang.io

For giggles can we get a comparison against dmc for Intel MKL assuming 
of course it compiles?

Sep 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 24 September 2016 at 12:08:33 UTC, rikki cattermole 
wrote:
 On 24/09/2016 7:20 PM, Ilya Yaroshenko wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] 
 benchmark. Intel
 MKL and Apple Accelerate was added to the benchmark today.

 Please help to improve the blog post during this weekend. It 
 will be
 announced in the Reddit.

 [1] 
 http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2]
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

 [3] http://mir.dlang.io

 For giggles can we get a comparison against dmc for Intel MKL 
 assuming of course it compiles?

Intel MKL is closed source. In the same time I don't think that a 
compiler makes sense for OpenBLAS, Intel MKL, and Apple 
Accelerate because their computation kernel source code written 
in assembler.

Sep 24 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/24/16 3:20 AM, Ilya Yaroshenko wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] benchmark. Intel
 MKL and Apple Accelerate was added to the benchmark today.

 Please help to improve the blog post during this weekend. It will be
 announced in the Reddit.

 [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2]
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

 [3] http://mir.dlang.io

Awesome. Good to see that most of the graphs have a nice blue envelope 
:o). Could you also add a comparison with SciPy? People often say it's 
just fine for scientific computing.

A few correx:

"The post represents performance benchmark" -> "This post presents 
performance benchmarks"

"most of numerical" -> "most numerical"

"for example Julia Programing Language" -> "for example the Julia 
Programing Language"

"Mir GLAS is Generic Linear Algebra Subroutines. It has single generic 
kernel for all targets, all floating point and complex types." -> "Mir 
GLAS (Generic Linear Algebra Subroutines) has a single generic kernel 
for all CPU targets, all floating point types, and all complex types."

"In addition, Mir GLAS Level 3 kernels are not unrolled and produce tiny 
binary code." -> "In addition, Mir GLAS Level 3 kernels are not unrolled 
and produce tiny binary code, so they put less pressure on the 
instruction cache in large applications."

"To add new architecture" -> "To add a new architecture"

"needs to extend small GLAS configuration file" -> "needs to extend one 
small GLAS configuration file"

"configuration is available for" -> "configurations are available for"

"Mir GLAS has native mir.ndslice interface." -> "Mir GLAS offers a 
native interface in module mir.ndslice."

"for almost all cases" -> "for virtually all benchmarks and parameters"

"Ilya is IT consultant, statistician. He has experience in distributed 
High Load services, business process analyses. He is the author of 
std.experimental.ndslice and Mir founder. He was a GSoC mentor for the D 
Language Foundation and Mir project." -> "Ilya is an IT consultant with 
a background in statistics. He has experience in distributed high-load 
services and business process analyses. He is the creator of the Mir 
library, including std.experimental.ndslice in the D Standard Library. 
He mentored a related GSoC project for the D Language Foundation."



Andrei

Sep 24 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Also the linkedin photo is much better than the one at the bottom of the 
benchmark page. -- Andrei

Sep 24 2016

John Colvin <john.loughran.colvin gmail.com> writes:

On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just fine for scientific computing.

That's just BLAS (so could be mkl, could be openBLAS, could be 
netlib, etc. just depends on the system and compilation choices) 
under the hood, you'd just see a small overhead from the python 
wrapping. Basically, everyone uses a BLAS or Eigen.

An Eigen comparison would be interesting.

Sep 24 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/24/16 9:18 AM, John Colvin wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say it's just
 fine for scientific computing.

 That's just BLAS (so could be mkl, could be openBLAS, could be netlib,
 etc. just depends on the system and compilation choices) under the hood,
 you'd just see a small overhead from the python wrapping. Basically,
 everyone uses a BLAS or Eigen.

I see, thanks. To the extent the Python-specific overheads are 
measurable, it might make sense to include the benchmark.

 An Eigen comparison would be interesting.

That'd be awesome especially since the article text refers to it.


Andrei

Sep 24 2016

jmh530 <john.michael.hall gmail.com> writes:

On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei 
Alexandrescu wrote:
 I see, thanks. To the extent the Python-specific overheads are 
 measurable, it might make sense to include the benchmark.

Here are some benchmarks from Eigen and Blaze for comparison
http://eigen.tuxfamily.org/index.php?title=Benchmark
https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks

They don't include Python, for the reason mentioned above (no one 
would use native python implementation of matrix multiplication, 
it just calls some other library).

I don't see a reason to include it here.

Sep 24 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 09/24/2016 10:26 AM, jmh530 wrote:
 On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei Alexandrescu wrote:
 I see, thanks. To the extent the Python-specific overheads are
 measurable, it might make sense to include the benchmark.

 Here are some benchmarks from Eigen and Blaze for comparison
 http://eigen.tuxfamily.org/index.php?title=Benchmark
 https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks

 They don't include Python, for the reason mentioned above (no one would
 use native python implementation of matrix multiplication, it just calls
 some other library).

 I don't see a reason to include it here.

OK. Yah, native Python wouldn't make sense. It may be worth mentioning 
that SciPy uses BLAS so it has the same performance profile.

Also, a great idea for a followup would be a blog post comparing the 
source code for a typical linear algebra real-world task. The idea 
being, yes the D version has parity with Intel, but there _is_ a reason 
to switch to it because of its ease of use.


Andrei

Sep 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei 
Alexandrescu wrote:
 On 9/24/16 9:18 AM, John Colvin wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
 Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just
 fine for scientific computing.

 That's just BLAS (so could be mkl, could be openBLAS, could be 
 netlib,
 etc. just depends on the system and compilation choices) under 
 the hood,
 you'd just see a small overhead from the python wrapping. 
 Basically,
 everyone uses a BLAS or Eigen.

 I see, thanks. To the extent the Python-specific overheads are 
 measurable, it might make sense to include the benchmark.

 An Eigen comparison would be interesting.

 That'd be awesome especially since the article text refers to 
 it.


 Andrei

Eigen was added (but only data, still need to write text). 
Relative charts was added. You was added "Acknowledgements" 
section --Ilya

Sep 24 2016

ZombineDev <petar.p.kirov gmail.com> writes:

On Saturday, 24 September 2016 at 17:46:07 UTC, Ilya Yaroshenko 
wrote:
 On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei 
 Alexandrescu wrote:
 On 9/24/16 9:18 AM, John Colvin wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
 Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just
 fine for scientific computing.

 That's just BLAS (so could be mkl, could be openBLAS, could 
 be netlib,
 etc. just depends on the system and compilation choices) 
 under the hood,
 you'd just see a small overhead from the python wrapping. 
 Basically,
 everyone uses a BLAS or Eigen.

 I see, thanks. To the extent the Python-specific overheads are 
 measurable, it might make sense to include the benchmark.

 An Eigen comparison would be interesting.

 That'd be awesome especially since the article text refers to 
 it.


 Andrei

 Eigen was added (but only data, still need to write text). 
 Relative charts was added. You was added "Acknowledgements" 
 section --Ilya

It would also be interesting to compare the results to Blaze [1]. 
According to https://www.youtube.com/watch?v=hfn0BVOegac it is 
faster than Eigen and on some instances faster than even Intel 
MKL.

[1]: https://bitbucket.org/blaze-lib/blaze

Sep 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 24 September 2016 at 18:15:30 UTC, ZombineDev wrote:
 On Saturday, 24 September 2016 at 17:46:07 UTC, Ilya Yaroshenko 
 wrote:
 On Saturday, 24 September 2016 at 13:49:35 UTC, Andrei 
 Alexandrescu wrote:
 [...]

 Eigen was added (but only data, still need to write text). 
 Relative charts was added. You was added "Acknowledgements" 
 section --Ilya

 It would also be interesting to compare the results to Blaze 
 [1]. According to https://www.youtube.com/watch?v=hfn0BVOegac 
 it is faster than Eigen and on some instances faster than even 
 Intel MKL.

 [1]: https://bitbucket.org/blaze-lib/blaze

It has not CBLAS interface like Eigen, so additional efforts are 
required.

Sep 25 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:
 Eigen was added (but only data, still need to write text). Relative
 charts was added.

Looks awesome. Couple more nits after one more pass:

"numerical and scientific projects" -> "numeric and scientific projects"

"OpenBLAS Haswell computation kernels" -> "The OpenBLAS Haswell 
computation kernels"

"To add a new architecture or target an engineer" -> "To add a new 
architecture or target, an engineer"

"configurations are available for X87, SSE2, AVX, and AVX2 instruction 
sets" -> "configurations are available for the X87, SSE2, AVX, and AVX2 
instruction sets"

In the machine, you may want to specify the amount of L2 cache (I think 
it's 6 MB)

Instead of "Recent" MKL, a version number would be more precise

Relative performance plots should specify "percent", i.e. "Performance 
relative to Mir" -> "Performance relative to Mir [%]"

"General Matrix-matrix Multiplication" -> "General Matrix-Matrix 
Multiplication"


Andrei

Sep 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 24 September 2016 at 19:01:47 UTC, Andrei 
Alexandrescu wrote:
 On 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:
 [...]

 Looks awesome. Couple more nits after one more pass:

 "numerical and scientific projects" -> "numeric and scientific 
 projects"

 [...]

Thank for the review! I have added notes about Eigen and CBLAS 
interface example.

Ilya

Sep 25 2016

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 09/25/2016 03:45 AM, Ilya Yaroshenko wrote:
 On Saturday, 24 September 2016 at 19:01:47 UTC, Andrei Alexandrescu wrote:
 On 09/24/2016 01:46 PM, Ilya Yaroshenko wrote:
 [...]

 Looks awesome. Couple more nits after one more pass:

 "numerical and scientific projects" -> "numeric and scientific projects"

 [...]

 Thank for the review! I have added notes about Eigen and CBLAS interface
 example.

 Ilya

Some more:

In the same time, CBLAS interface is unwieldy ->
On the other hand, CBLAS interface is unwieldy
(Or something better?)

GLAS calling conversion ->
GLAS calling convention

single precisions ->
single precision
(Several occurrences)

double precisions ->
double precision
(Several occurrences)

Stay in touch with the lastest developments in scientific computing for 
D. ->
(I will let others recommend something better there but neither "stay in 
touch" nor "lastest" sounds right to my ears. :) )

Ali

Sep 25 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Sunday, 25 September 2016 at 23:03:27 UTC, Ali Çehreli wrote:
 On 09/25/2016 03:45 AM, Ilya Yaroshenko wrote:
 On Saturday, 24 September 2016 at 19:01:47 UTC, Andrei 
 Alexandrescu wrote:
 [...]

 Thank for the review! I have added notes about Eigen and CBLAS 
 interface
 example.

 Ilya

 Some more:

 In the same time, CBLAS interface is unwieldy ->
 On the other hand, CBLAS interface is unwieldy
 (Or something better?)

 GLAS calling conversion ->
 GLAS calling convention

 single precisions ->
 single precision
 (Several occurrences)

 double precisions ->
 double precision
 (Several occurrences)

 Stay in touch with the lastest developments in scientific 
 computing for D. ->
 (I will let others recommend something better there but neither 
 "stay in touch" nor "lastest" sounds right to my ears. :) )

 Ali

Thank you, fixed

Sep 25 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Sunday, 25 September 2016 at 23:03:27 UTC, Ali Çehreli wrote:
 Stay in touch with the lastest developments in scientific 
 computing for D. ->
 (I will let others recommend something better there but neither 
 "stay in touch" nor "lastest" sounds right to my ears. :) )

"lastest" -> "latest" ... ?

Sep 26 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Sunday, 25 September 2016 at 10:45:35 UTC, Ilya Yaroshenko 
wrote:
 Thank for the review! I have added notes about Eigen and CBLAS 
 interface example.

One extra suggestion:

"Mir GLAS has native mir.ndslice interface" -> "Mir GLAS has a 
native mir.ndslice interface"

I would also suggest adding a small note on what `ndslice` is, 
e.g.

"mir.ndslice is a development version of 
std.experimental.ndslice, which provides an N-dimensional 
equivalent of D's built-in array slicing."

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Monday, 26 September 2016 at 08:57:06 UTC, Joseph Rushton 
Wakeling wrote:
 On Sunday, 25 September 2016 at 10:45:35 UTC, Ilya Yaroshenko 
 wrote:
 Thank for the review! I have added notes about Eigen and CBLAS 
 interface example.

 One extra suggestion:

 "Mir GLAS has native mir.ndslice interface" -> "Mir GLAS has a 
 native mir.ndslice interface"

 I would also suggest adding a small note on what `ndslice` is, 
 e.g.

 "mir.ndslice is a development version of 
 std.experimental.ndslice, which provides an N-dimensional 
 equivalent of D's built-in array slicing."

Thank you, added

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 24 September 2016 at 13:18:14 UTC, John Colvin wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
 Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just fine for scientific computing.

 That's just BLAS (so could be mkl, could be openBLAS, could be 
 netlib, etc. just depends on the system and compilation 
 choices) under the hood, you'd just see a small overhead from 
 the python wrapping. Basically, everyone uses a BLAS or Eigen.

 An Eigen comparison would be interesting.

Seems like libeigen_blas.dylib and libeigen_blas_static.a does 
not contain _cblas_sgemm symbol for example. Does they work for 
you?

Sep 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 24 September 2016 at 14:59:32 UTC, Ilya Yaroshenko 
wrote:
 On Saturday, 24 September 2016 at 13:18:14 UTC, John Colvin 
 wrote:
 On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
 Alexandrescu wrote:
 Could you also add a comparison with SciPy? People often say 
 it's just fine for scientific computing.

 That's just BLAS (so could be mkl, could be openBLAS, could be 
 netlib, etc. just depends on the system and compilation 
 choices) under the hood, you'd just see a small overhead from 
 the python wrapping. Basically, everyone uses a BLAS or Eigen.

 An Eigen comparison would be interesting.

 Seems like libeigen_blas.dylib and libeigen_blas_static.a does 
 not contain _cblas_sgemm symbol for example. Does they work for 
 you?

Fixed with Netlib CBLAS

Sep 24 2016

dextorious <dextorious gmail.com> writes:

First of all, awesome work. It's great to see that it's possible 
to match or even exceed the performance of hand-crafted assembly 
implementations with generic code.

I would suggest adding more information on how the Eigen results 
were obtained. Unlike OpenBLAS, Eigen performance does often vary 
by compiler and varies greatly depending on the kind of 
preprocessor macros that are defined. In particular, 
EIGEN_NO_DEBUG is defined by default and reduces performance, 
EIGEN_FAST_MATH is not defined by default but can often increase 
performance and EIGEN_STACK_ALLOCATION_LIMIT matters greatly for 
performance on very small matrices (where MKL and especially 
OpenBLAS are very inefficient). It's been a while since I've used 
Eigen, so I may have forgotten one or two.

It may also be worth noting in the blog post that these are all 
single threaded comparisons and multithreaded implementations are 
on the way. This is obvious to anyone who's followed the 
development of Mir, but a general audience on Reddit will likely 
point it out as a deficiency unless stated upfront.

Sep 24 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Saturday, 24 September 2016 at 12:52:09 UTC, Andrei 
Alexandrescu wrote:
 On 9/24/16 3:20 AM, Ilya Yaroshenko wrote:
 [...]

 Awesome. Good to see that most of the graphs have a nice blue 
 envelope :o). Could you also add a comparison with SciPy? 
 People often say it's just fine for scientific computing.

 [...]

Thank you !!! --Ilya

Sep 24 2016

WebFreak001 <janju007 web.de> writes:

On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] 
 benchmark. Intel MKL and Apple Accelerate was added to the 
 benchmark today.

 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.

 [1] 
 http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2] 
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html
 [3] http://mir.dlang.io

I think you should put the Mir.GLAS graph in front of all the 
other graphs, right now they are overlapping on that graph. Would 
probably look a bit better if Mir.GLAS was in the front

Sep 24 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/24/16 8:59 AM, WebFreak001 wrote:
 On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko wrote:
 Yesterday I announced [1] blog post [2]  about Mir [3] benchmark.
 Intel MKL and Apple Accelerate was added to the benchmark today.

 Please help to improve the blog post during this weekend. It will be
 announced in the Reddit.

 [1] http://forum.dlang.org/thread/yhfbuxnrqkiqtvsnvngf forum.dlang.org
 [2]
 http://blog.mir.dlang.io/glas/benchmark/openblas/2016/09/23/glas-gemm-benchmark.html

 [3] http://mir.dlang.io

 I think you should put the Mir.GLAS graph in front of all the other
 graphs, right now they are overlapping on that graph. Would probably
 look a bit better if Mir.GLAS was in the front

Also, one other class of plots that would be informative: performance of 
all other libraries normalized to Mir. The Y axis would be in 
percentages with Mir at 100%. Then people can easily see what relative 
gains to expect. -- Andrei

Sep 24 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
wrote:
 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.

One other place that a little more explanation could be helpful 
is this sentence:

"It is written completely in D for LDC (LLVM D Compiler), without 
any assembler blocks."

It would be nice to describe (if it can be summarized in a 
sentence) why Mir GLAS relies on LDC and/or LLVM, and what 
differences in outcome can be expected if one uses a different 
compiler (will it not work at all, or just not as well?).

The broader topic of what compiler features Mir GLAS uses could 
be the topic of an entire blog post in its own right, and might 
be very interesting.

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton 
Wakeling wrote:
 On Saturday, 24 September 2016 at 07:20:25 UTC, Ilya Yaroshenko 
 wrote:
 Please help to improve the blog post during this weekend. It 
 will be announced in the Reddit.

 One other place that a little more explanation could be helpful 
 is this sentence:

 "It is written completely in D for LDC (LLVM D Compiler), 
 without any assembler blocks."

 It would be nice to describe (if it can be summarized in a 
 sentence) why Mir GLAS relies on LDC and/or LLVM, and what 
 differences in outcome can be expected if one uses a different 
 compiler (will it not work at all, or just not as well?).

 The broader topic of what compiler features Mir GLAS uses could 
 be the topic of an entire blog post in its own right, and might 
 be very interesting.

Updated:
Mir is LLVM-Accelerated Generic Numerical Library for Science and 
Machine Learning. It requires LDC (LLVM D Compiler) for 
compilation. Mir GLAS (Generic Linear Algebra Subprograms) has a 
single generic kernel for all CPU targets, all floating point 
types, and all complex types. It is written completely in D, 
without any assembler blocks. In addition, Mir GLAS Level 3 
kernels are not unrolled and produce tiny binary code, so they 
put less pressure on the instruction cache in large applications.

Sep 26 2016

Edwin van Leeuwen <edder tkwsping.nl> writes:

On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko 
wrote:
 Updated:
 Mir is LLVM-Accelerated Generic Numerical Library for Science 
 and Machine Learning. It requires LDC (LLVM D Compiler) for 
 compilation.

It doesn't really require LDC though, it just requires it to get 
good performance? I can still use DMD for quick testing?

Sep 26 2016

Edwin van Leeuwen <edder tkwsping.nl> writes:

On Monday, 26 September 2016 at 11:36:11 UTC, Edwin van Leeuwen 
wrote:
 On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko 
 wrote:
 Updated:
 Mir is LLVM-Accelerated Generic Numerical Library for Science 
 and Machine Learning. It requires LDC (LLVM D Compiler) for 
 compilation.

 It doesn't really require LDC though, it just requires it to 
 get good performance? I can still use DMD for quick testing?

I would say something like:

For optimal performance it should be compiled using LDC.

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Monday, 26 September 2016 at 11:36:11 UTC, Edwin van Leeuwen 
wrote:
 On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko 
 wrote:
 Updated:
 Mir is LLVM-Accelerated Generic Numerical Library for Science 
 and Machine Learning. It requires LDC (LLVM D Compiler) for 
 compilation.

 It doesn't really require LDC though, it just requires it to 
 get good performance? I can still use DMD for quick testing?

No, LDC is required. I plan to update DUB for quick testing 
without binary compilation for DUB. The reason why DMD support 
was dropped is that it generates 10-20 times slower code for 
matrix multiplication.

My opinion is that D community is too small to maintain 3 
compilers and we should move forward with LDC.

Ilya

Sep 26 2016

Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:

On Monday, 26 September 2016 at 11:32:20 UTC, Ilya Yaroshenko 
wrote:
 Updated:
 Mir is LLVM-Accelerated Generic Numerical Library for Science 
 and Machine Learning. It requires LDC (LLVM D Compiler) for 
 compilation. Mir GLAS (Generic Linear Algebra Subprograms) has 
 a single generic kernel for all CPU targets, all floating point 
 types, and all complex types. It is written completely in D, 
 without any assembler blocks. In addition, Mir GLAS Level 3 
 kernels are not unrolled and produce tiny binary code, so they 
 put less pressure on the instruction cache in large 
 applications.

Hmmm, I was thinking more along the lines of just describing 
(very briefly) what features of LLVM Mir GLAS relies on.  But I 
think this might run the risk of endless re-revision.

One minor tweak:

"Mir is LLVM-Accelerated Generic Numerical Library" -> "Mir is an 
LLVM-Accelerated Generic Numerical Library"

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Monday, 26 September 2016 at 12:20:25 UTC, Joseph Rushton 
Wakeling wrote:
 "Mir is LLVM-Accelerated Generic Numerical Library" -> "Mir is 
 an LLVM-Accelerated Generic Numerical Library"

Thanks, fixed

Sep 26 2016

Johan Engelen <j j.nl> writes:

On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton 
Wakeling wrote:
 The broader topic of what compiler features Mir GLAS uses could 
 be the topic of an entire blog post in its own right, and might 
 be very interesting.

I guess this is my terrain. I'll think about writing that blog 
post :)

Specific LDC features that I see in GLAS are:

- __traits(targetHasFeature, ...)  , see 
https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature

-  fastmath, see 
https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29

- Modules ldc.simd and ldc.intrinsics.

- Extended allowed sizes for __vector (still very limited)

To get an idea of what is different for LDC and DMD, this PR 
removed support for DMD: https://github.com/libmir/mir/pull/347

-Johan

Sep 26 2016

Edwin van Leeuwen <edder tkwsping.nl> writes:

On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen wrote:
 On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton 
 Wakeling wrote:
 The broader topic of what compiler features Mir GLAS uses 
 could be the topic of an entire blog post in its own right, 
 and might be very interesting.

 I guess this is my terrain. I'll think about writing that blog 
 post :)

 Specific LDC features that I see in GLAS are:

 - __traits(targetHasFeature, ...)  , see 
 https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature

 -  fastmath, see 
 https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29

 - Modules ldc.simd and ldc.intrinsics.

 - Extended allowed sizes for __vector (still very limited)

 To get an idea of what is different for LDC and DMD, this PR 
 removed support for DMD: https://github.com/libmir/mir/pull/347

 -Johan

Ah, I was not aware that DMD support was dropped completely. I 
think that is a real shame, and it makes it _much_ less likely 
that I will use mir in my own projects, let alone as a dependency 
in another library.

Sep 26 2016

Johan Engelen <j j.nl> writes:

On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen 
wrote:
 
 Ah, I was not aware that DMD support was dropped completely. I 
 think that is a real shame, and it makes it _much_ less likely 
 that I will use mir in my own projects, let alone as a 
 dependency in another library.

"_much_"
:'( :'(  Please don't write that to LDC devs.

Sep 26 2016

Edwin van Leeuwen <edder tkwsping.nl> writes:

On Monday, 26 September 2016 at 11:59:57 UTC, Johan Engelen wrote:
 On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen 
 wrote:
 
 Ah, I was not aware that DMD support was dropped completely. I 
 think that is a real shame, and it makes it _much_ less likely 
 that I will use mir in my own projects, let alone as a 
 dependency in another library.

 "_much_"
 :'( :'(  Please don't write that to LDC devs.

I love LDC, I just also tend to use DMD for testing and won't 
force people to use ldc over dmd if they want to use a library I 
build.

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen 
wrote:
 On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen 
 wrote:
 On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton 
 Wakeling wrote:
 The broader topic of what compiler features Mir GLAS uses 
 could be the topic of an entire blog post in its own right, 
 and might be very interesting.

 I guess this is my terrain. I'll think about writing that blog 
 post :)

 Specific LDC features that I see in GLAS are:

 - __traits(targetHasFeature, ...)  , see 
 https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature

 -  fastmath, see 
 https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29

 - Modules ldc.simd and ldc.intrinsics.

 - Extended allowed sizes for __vector (still very limited)

 To get an idea of what is different for LDC and DMD, this PR 
 removed support for DMD: https://github.com/libmir/mir/pull/347

 -Johan

 Ah, I was not aware that DMD support was dropped completely. I 
 think that is a real shame, and it makes it _much_ less likely 
 that I will use mir in my own projects, let alone as a 
 dependency in another library.

Shame is that D is not popular. I think that Mir can replace C / 
C++ for hight performance application. And became the best 
industry system language.

My goal is not a package for D community. My goal is a library 
for industry. A library that can involve new comers and extend D 
community multiple times.

Ilya

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Monday, 26 September 2016 at 12:11:16 UTC, Ilya Yaroshenko 
wrote:
 On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen 
 wrote:
 On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen 
 wrote:
 [...]

 Ah, I was not aware that DMD support was dropped completely. I 
 think that is a real shame, and it makes it _much_ less likely 
 that I will use mir in my own projects, let alone as a 
 dependency in another library.

 Shame is that D is not popular. I think that Mir can replace C 
 / C++ for hight performance application. And became the best 
 industry system language.

 My goal is not a package for D community. My goal is a library 
 for industry. A library that can involve new comers and extend 
 D community multiple times.

 Ilya

EDIT: that Mir can help D to replace ...

Sep 26 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 9/26/16 2:11 PM, Ilya Yaroshenko wrote:
 On Monday, 26 September 2016 at 11:56:39 UTC, Edwin van Leeuwen wrote:
 On Monday, 26 September 2016 at 11:46:19 UTC, Johan Engelen wrote:
 On Monday, 26 September 2016 at 11:11:20 UTC, Joseph Rushton Wakeling
 wrote:
 The broader topic of what compiler features Mir GLAS uses could be
 the topic of an entire blog post in its own right, and might be very
 interesting.

 I guess this is my terrain. I'll think about writing that blog post :)

 Specific LDC features that I see in GLAS are:

 - __traits(targetHasFeature, ...)  , see
 https://wiki.dlang.org/LDC-specific_language_changes#targetHasFeature

 -  fastmath, see
 https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.fastmath.29


 - Modules ldc.simd and ldc.intrinsics.

 - Extended allowed sizes for __vector (still very limited)

 To get an idea of what is different for LDC and DMD, this PR removed
 support for DMD: https://github.com/libmir/mir/pull/347

 -Johan

 Ah, I was not aware that DMD support was dropped completely. I think
 that is a real shame, and it makes it _much_ less likely that I will
 use mir in my own projects, let alone as a dependency in another library.

 Shame is that D is not popular. I think that Mir can replace C / C++ for
 hight performance application. And became the best industry system
 language.

 My goal is not a package for D community. My goal is a library for
 industry. A library that can involve new comers and extend D community
 multiple times.

I think we need to make it a point to support Mir in dmd. -- Andrei

Sep 26 2016

jmh530 <john.michael.hall gmail.com> writes:

On Monday, 26 September 2016 at 16:55:02 UTC, Andrei Alexandrescu 
wrote:
 I think we need to make it a point to support Mir in dmd. -- 
 Andrei

+1, even if it's slow.

Sep 26 2016

Johan Engelen <j j.nl> writes:

On Monday, 26 September 2016 at 18:27:15 UTC, jmh530 wrote:
 On Monday, 26 September 2016 at 16:55:02 UTC, Andrei 
 Alexandrescu wrote:
 I think we need to make it a point to support Mir in dmd. -- 
 Andrei

 +1, even if it's slow.

I thought so too but if the algorithm is 50x slower, it probably 
means you can't develop that algorithm any more (I wouldn't). I 
think the common use-case for Mir is a calculation that takes 
seconds, so 50x turns a test into a run of several minutes...
(defeating the compilation speed advantage of DMD)

It is easy to want something, but someone else has to do it and 
live with it too. It's up to the Mir devs (**volunteers!**) to 
choose which compilers they support. As you can see from the PR 
that removed DMD support, the extra burden is substantial.

Sep 26 2016

Ilya Yaroshenko <ilyayaroshenko gmail.com> writes:

On Monday, 26 September 2016 at 16:55:02 UTC, Andrei Alexandrescu 
wrote:
 I think we need to make it a point to support Mir in dmd. -- 
 Andrei

new thread 
https://forum.dlang.org/thread/pqgtvxklmedxuztopwiq forum.dlang.org

Sep 26 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Mir GLAS vs Intel MKL: which is faster?