NEON (SIMD) does not give performance increase

Note: This was originally posted on 15th August 2011 at http://forums.arm.com

Hello!

I am trying manually increase performance of some image processing functions.

I am using neon intrinsics and c language.

If processed elements located one by one or in one place i can load them with pld all at once and in such cases i have performance win about 3-4 times.

But if processed elements located at random places in memory (for example when i do bilinearinterpolation) there is no neon intrinsics for fast way of doing it. I need to place elements in array manually one by one(with c code) and then pld this array with neon. Or using vgetq_lane_, vsetq_lane.I think this actions take most of time.  In these case i have no performance win at all.

How to speed up such functions? What am i doing wrong?

Thank you.
Parents
No Data
Reply Children
No Data