This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Using st1 to copy 3 floats from NEON register

Hi,

I am trying to copy 3 floats from NEON register v8 to a c array pOutVertex2. Unfortunately, I didn't really understand the documentation about post-index stuff. 

This is what I have; 

"st1    {v8.d}[0], [%[pOutVertex2]], #8    \t\n"    // double value output to pOutVertex2
"st1    {v8.s}[2], [%[pOutVertex2]]        \t\n"    // pOutVertex2[2]

I was expecting that the "#8" would advance the address by 2 bytes, but it doesn't.

If anyone could give me advice how to do this I would be extremely grateful,

Ed

  • Update:

    I just realised that my problem I had was in a different part of the code and the store code was actually correct.

    The problem was that I was using this to load 16 floats:

        "ld4     {v0.4s, v1.4s, v2.4s, v3.4s}, [%[m]]  \n\t"

    It seems to jumble the numbers around. I don't know why, but instead I replace it with this and it works correctly:

            "ld1     {v0.4s}, [%[m]], #16  \n\t"
            "ld1     {v1.4s}, [%[m]], #16  \n\t"
            "ld1     {v2.4s}, [%[m]], #16  \n\t"
            "ld1     {v3.4s}, [%[m]], #16  \n\t"

    Just wondering is using 4 ld1 instructions slower than one ld4?