Hi,
I am trying to copy 3 floats from NEON register v8 to a c array pOutVertex2. Unfortunately, I didn't really understand the documentation about post-index stuff.
This is what I have;
"st1 {v8.d}[0], [%[pOutVertex2]], #8 \t\n" // double value output to pOutVertex2"st1 {v8.s}[2], [%[pOutVertex2]] \t\n" // pOutVertex2[2]
I was expecting that the "#8" would advance the address by 2 bytes, but it doesn't.
If anyone could give me advice how to do this I would be extremely grateful,
Ed
Update:
I just realised that my problem I had was in a different part of the code and the store code was actually correct.
The problem was that I was using this to load 16 floats:
"ld4 {v0.4s, v1.4s, v2.4s, v3.4s}, [%[m]] \n\t"
It seems to jumble the numbers around. I don't know why, but instead I replace it with this and it works correctly:
"ld1 {v0.4s}, [%[m]], #16 \n\t" "ld1 {v1.4s}, [%[m]], #16 \n\t" "ld1 {v2.4s}, [%[m]], #16 \n\t" "ld1 {v3.4s}, [%[m]], #16 \n\t"
Just wondering is using 4 ld1 instructions slower than one ld4?
Actually, never mind. I figured it out.