digitalmars.D.learn - AVX for math code ... avx instructions later disappearing ?
- james.p.leblanc (65/65) Sep 26 2021 Dear D-ers,
- kinke (12/15) Sep 26 2021 That's because the `@fastmath` UDA applies to the next
- james.p.leblanc (12/27) Sep 26 2021 Kinke,
Dear D-ers, I enjoyed reading some details of incorporating AVX into math code from Johan Engelen's programming blog post: http://johanengelen.github.io/ldc/2016/10/11/Math-performance-LDC.html Basically, one can use the ldc compiler to insert avx code, nice! In playing with some variants of his example code, I realize that there are issues I do not understand. For example, the following code successfully incorporates the avx instructions: ```d // File here is called dotFirst.d import ldc.attributes : fastmath; fastmath double dot( double[] a, double[] b) { double s = 0.0; foreach (size_t i; 0 .. a.length) { s += a[i] * b[i]; } return s; } double[8] x =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ]; double[8] y =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ]; void main() { double z = 0.0; z = dot(x, y); } ``` If we run: ldc2 -c -output-s -O3 -release dotFirst.d -mcpu=haswell echo "Results of grep ymm dotFirst.s:" grep ymm dotFirst.s The "grep" shows a number of vector instructions, such as: **vfmadd132pd 160(%rcx,%rdi,8), %ymm5, %ymm1** However, subtle changes in the code (such as moving the dot product function to a module, or even moving the array declarations to before the dot product function, and the avx instructions will disappear! ```d import ldc.attributes : fastmath; fastmath double[8] x =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ]; double[8] y =[0.0, 1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, ]; double dot( double[] a, double[] b) { double s = 0.0; foreach (size_t i; 0 .. a.length) { ... ``` Now a grep will not find a single **ymm**. It is understood that ldc needs proper alignment to be able to do the vector instructions... **But my question is:** how is proper alignment guaranteed? (Most importantly how guaranteed among code using modules)?? (There are related stack alignment issues -- 16?) Best Regards, James PS I have come across scattered bits of (sometimes contradictory) information on avx/simd for dlang. Is there a canonical source for vector info?
Sep 26 2021
On Sunday, 26 September 2021 at 18:08:46 UTC, james.p.leblanc wrote:or even moving the array declarations to before the dot product function, and the avx instructions will disappear!That's because the ` fastmath` UDA applies to the next declaration only, which is the `x` array in your 2nd example (where it obviously has no effect). Either use ` fastmath:` with the colon to apply it to the entire scope, or use `-ffast-math` in the LDC cmdline. Similarly, when moving the function to another module and you don't include that module in the cmdline, it's only imported and not compiled and won't show up in the resulting assembly. Wrt. stack alignment, there aren't any issues with LDC AFAIK (not limited to 16 or whatever like DMD).
Sep 26 2021
On Sunday, 26 September 2021 at 19:00:54 UTC, kinke wrote:On Sunday, 26 September 2021 at 18:08:46 UTC, james.p.leblanc wrote:Kinke, Thanks very much for your response. There were many issues that I had been misunderstanding in my attempts. The provided explanation helped me understand the broader scope of what is happening. (I never even thought about the fastmath UDA aspect! ... a bit embarrassing for me!) Using the -ffast-math in the LDC cmdline seems to be a most elegant solution. Much appreciated! Regards, Jamesor even moving the array declarations to before the dot product function, and the avx instructions will disappear!That's because the ` fastmath` UDA applies to the next declaration only, which is the `x` array in your 2nd example (where it obviously has no effect). Either use ` fastmath:` with the colon to apply it to the entire scope, or use `-ffast-math` in the LDC cmdline. Similarly, when moving the function to another module and you don't include that module in the cmdline, it's only imported and not compiled and won't show up in the resulting assembly. Wrt. stack alignment, there aren't any issues with LDC AFAIK (not limited to 16 or whatever like DMD).
Sep 26 2021