www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Not auto-vectorization

reply "bearophile" <bearophileHUGS lycos.com> writes:
On Reddit they have linked an article that shows auto 
vectorization in GCC 4.7:

http://locklessinc.com/articles/vectorize/

http://www.reddit.com/r/programming/comments/tz6ml/autovectorization_with_gcc_47/

GCC is good, it knows many tricks, it contains a lot of pattern 
matching code and other code to allow such vectorizations, and 
that C code is almost transparent & standard (restrict is 
standard, and I think __builtin_assume_aligned isn't too much 
hard to #define away when not available. And something like 
--fast-math is available on most compilers (despite Walter 
doesn't like it)). So it's good to optimize legacy C code too.

But this article also shows why such strategy is not usable for 
serious purposes. If small changes risk turning off such major 
optimizations, you can't rely much on them. More generally, 
writing low-level code and hoping the compiler recovers that high 
level semantics of the code is a bit ridiculous. It's way better 
to express that semantics in a more direct way, in a standard way 
that's understood by all compilers of a language (this also 
because the code shown in that article has very simple semantics).

How is the development of the D SIMD ops going? Are those efforts 
(maybe with the help of another higher level Phobos lib) going to 
avoid the silly problems shown in that article?

Bye,
bearophile
May 22 2012
next sibling parent Denis Shelomovskij <verylonglogin.reg gmail.com> writes:
22.05.2012 22:52, bearophile написал:
 How is the development of the D SIMD ops going? Are those efforts (maybe
 with the help of another higher level Phobos lib) going to avoid the
 silly problems shown in that article?

So the question is: do we need `aligned(T)(T)` function? It can work like current `scoped` implementation e.g.: https://github.com/D-Programming-Language/phobos/pull/570/files#L0R3096 Thoughts? -- Денис В. Шеломовский Denis V. Shelomovskij
May 22 2012
prev sibling next sibling parent Andrew Wiley <wiley.andrew.j gmail.com> writes:
--bcaec5396502400c7d04c0a4e581
Content-Type: text/plain; charset=ISO-8859-1

On Tue, May 22, 2012 at 1:52 PM, bearophile <bearophileHUGS lycos.com>wrote:

 On Reddit they have linked an article that shows auto vectorization in GCC
 4.7:

 http://locklessinc.com/**articles/vectorize/<http://locklessinc.com/articles/vectorize/>

 http://www.reddit.com/r/**programming/comments/tz6ml/**
 autovectorization_with_gcc_47/<http://www.reddit.com/r/programming/comments/tz6ml/autovectorization_with_gcc_47/>

 GCC is good, it knows many tricks, it contains a lot of pattern matching
 code and other code to allow such vectorizations, and that C code is almost
 transparent & standard (restrict is standard, and I think
 __builtin_assume_aligned isn't too much hard to #define away when not
 available. And something like --fast-math is available on most compilers
 (despite Walter doesn't like it)). So it's good to optimize legacy C code
 too.

 But this article also shows why such strategy is not usable for serious
 purposes. If small changes risk turning off such major optimizations, you
 can't rely much on them. More generally, writing low-level code and hoping
 the compiler recovers that high level semantics of the code is a bit
 ridiculous. It's way better to express that semantics in a more direct way,
 in a standard way that's understood by all compilers of a language (this
 also because the code shown in that article has very simple semantics).

This is also why building a compiler that outputs C is a bad idea. Performance inevitably suffers because the C output must have same or tighter semantic requirements than the input code, and high level optimizations are more difficult. --bcaec5396502400c7d04c0a4e581 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On Tue, May 22, 2012 at 1:52 PM, bearophile <spa= n dir=3D"ltr">&lt;<a href=3D"mailto:bearophileHUGS lycos.com" target=3D"_bl= ank">bearophileHUGS lycos.com</a>&gt;</span> wrote:<br><blockquote class=3D= "gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding= -left:1ex"> On Reddit they have linked an article that shows auto vectorization in GCC = 4.7:<br> <br> <a href=3D"http://locklessinc.com/articles/vectorize/" target=3D"_blank">ht= tp://locklessinc.com/<u></u>articles/vectorize/</a><br> <br> <a href=3D"http://www.reddit.com/r/programming/comments/tz6ml/autovectoriza= tion_with_gcc_47/" target=3D"_blank">http://www.reddit.com/r/<u></u>program= ming/comments/tz6ml/<u></u>autovectorization_with_gcc_47/</a><br> <br> GCC is good, it knows many tricks, it contains a lot of pattern matching co= de and other code to allow such vectorizations, and that C code is almost t= ransparent &amp; standard (restrict is standard, and I think __builtin_assu= me_aligned isn&#39;t too much hard to #define away when not available. And = something like --fast-math is available on most compilers (despite Walter d= oesn&#39;t like it)). So it&#39;s good to optimize legacy C code too.<br> <br> But this article also shows why such strategy is not usable for serious pur= poses. If small changes risk turning off such major optimizations, you can&= #39;t rely much on them. More generally, writing low-level code and hoping = the compiler recovers that high level semantics of the code is a bit ridicu= lous. It&#39;s way better to express that semantics in a more direct way, i= n a standard way that&#39;s understood by all compilers of a language (this= also because the code shown in that article has very simple semantics).<br=

</blockquote><div><br></div><div>This is also why building a compiler that = outputs C is a bad idea. Performance inevitably suffers because the C outpu= t must have same or tighter semantic requirements than the input code, and = high level optimizations are more difficult.</div> </div> --bcaec5396502400c7d04c0a4e581--
May 22 2012
prev sibling parent "Martin Nowak" <dawg dawgfoto.de> writes:
 GCC is good, it knows many tricks, it contains a lot of pattern matching  
 code and other code to allow such vectorizations, and that C code is  
 almost transparent & standard (restrict is standard, and I think  
 __builtin_assume_aligned isn't too much hard to #define away when not  
 available. And something like --fast-math is available on most compilers  
 (despite Walter doesn't like it)). So it's good to optimize legacy C  
 code too.

I was really surprised that all vectorization approaches seem to be restricted to loops. I'd think that loop unrolling + arithmetic vectorization should achieve most of a specialized loop vectorization. http://forum.dlang.org/post/jf1s30$14mj$1 digitalmars.com https://github.com/D-Programming-Language/phobos/blob/master/std/numeric.d#L2329
May 22 2012