www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - SIMD Progress?

reply Trevor Parscal <trevorparscal hotmail.com> writes:
Well, I've been reading everything I can about everything related to the issue
of SIMD using D's inline ASM.

For whatever reason, I've been able to get simd operations working on just
about every type of data - EXCEPT the ones I would find most useful... :(

Perhaps someone could look at this program (attached) and get the other 3
working...

Current Status

ASM Paralell Operations
- Dynamic Array (OK)
- Static Array (Not Working)
- Struct->Array on Heap (OK)
- Struct->Fields on Heap (OK)
- Struct->Array on Stack (Not Working)
- Struct->Fields on Stack (Not Working)
D Serial Operations
- Dynamic Array (OK)
- Static Array (OK)
- Struct->Array on Heap (OK)
- Struct->Fields on Heap (OK)
- Struct->Array on Stack (OK)
- Struct->Fields on Stack (OK)

It all has to do with alignment, and the fact that D doesn't align data on the
stack. I don't know of a work-around, but perhaps there is one out there yet to
surface.

- Trevor
Mar 06 2008
parent reply bearophile <bearophileHUGS lycos.com> writes:
Trevor Parscal Wrote:
 For whatever reason, I've been able to get simd operations working on just
about every type of data - EXCEPT the ones I would find most useful... :(
I have to read your code still, but in the meantime, how many times does it run compared to normal code? (Sometimes running speed comes first). Bye, bearophile
Mar 06 2008
parent Trevor Parscal <trevorparscal hotmail.com> writes:
bearophile Wrote:

 Trevor Parscal Wrote:
 For whatever reason, I've been able to get simd operations working on just
about every type of data - EXCEPT the ones I would find most useful... :(
I have to read your code still, but in the meantime, how many times does it run compared to normal code? (Sometimes running speed comes first). Bye, bearophile
It runs 1 time for each method... I was hoping to get it to work before I did benchmarking - however a benchmark I did with the ASM with Dynamic Arrays performed vector addition on 4 floats a very large number of times 300% faster... Probably not the final word on performance as I am sure the benchmark was highly flawed, but it seemed evident there was performance gain to be had. - Trevor
Mar 06 2008