This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

SIMD : VEXT -32-q1-q1-q1-3-slow

br-dev over 6 years ago

Hi, i have some issue on an inplace vetx.32 instructions. I post it on the Cortex A forum. Who has a tip or workaround as it is too slow on A7,A8,A9, etc... ? thanks

vetx-32-q1-q1-q1-3-slow

Top replies

Tamar Christina over 6 years ago +1

Hi br-dev, If you're talking about the latency of the instruction itself then there isn't really an alternative to it, You could in principle do it with two shifts (left and right) and an or, but that...

0 Paul Black over 6 years ago

Hi,

Which compiler are you using?

Thanks,

Paul.
Cancel
Vote up 0 Vote down

Cancel
0 br-dev over 6 years ago in reply to Paul Black

Hi Paul, gcc 8.3 but i did bench on many ARM (a7,a8,a9,a53,a72) and the results is homogeneous (different also) but seems logical. How ever i am not that much happy to spend so much cycles (5 on A7) for a single line instruction. I am happy if you can ask the team some workaround. cheers. bruno
Cancel
Vote up 0 Vote down

Cancel
0 Peterson Quadros over 6 years ago in reply to br-dev

Hi,

This is the Arm Compiler forum. Your query would be better answered in the GCC toolchain forum: https://community.arm.com/developer/tools-software/oss-platforms/f/gnu-toolchain-forum

Thanks

Peterson
Cancel
Vote up 0 Vote down

Cancel
0 Tamar Christina over 6 years ago

Hi br-dev,

If you're talking about the latency of the instruction itself then there isn't really an alternative to it,

You could in principle do it with two shifts (left and right) and an or, but that will undoubtedly be more expensive.

Depending on the actual operation you're doing you make be able to use a different sequence but if you're just only talking about vext then I don't believe there is.

Regards,

Tamar
Cancel
Vote up +1 Vote down

Cancel
0 br-dev over 6 years ago in reply to Tamar Christina

Thanks to All for the answer: Paul, Peterson and Tamar. Your answers contribute to sort it out as per Tamar's comment. I understand the latency of the instructions for in place unfortunately i also do not see how to do this operation another way. Doing a different sequence lead also to the same point on my use case (all roads lead to Rome) . closing the topic then. cheers.
Cancel
Vote up 0 Vote down

Cancel