NEON (SIMD) does not give performance increase

Note: This was originally posted on 15th August 2011 at http://forums.arm.com

Hello!

I am trying manually increase performance of some image processing functions.

I am using neon intrinsics and c language.

If processed elements located one by one or in one place i can load them with pld all at once and in such cases i have performance win about 3-4 times.

But if processed elements located at random places in memory (for example when i do bilinearinterpolation) there is no neon intrinsics for fast way of doing it. I need to place elements in array manually one by one(with c code) and then pld this array with neon. Or using vgetq_lane_, vsetq_lane.I think this actions take most of time.  In these case i have no performance win at all.

How to speed up such functions? What am i doing wrong?

Thank you.
Parents
No Data
Reply
  • Note: This was originally posted on 15th August 2011 at http://forums.arm.com


    If processed elements located one by one or in one place i can load them with pld all at once and in such cases i have performance win about 3-4 times.

    But if processed elements located at random places in memory (for example when i do bilinearinterpolation) there is no neon intrinsics for fast way of doing it. I need to place elements in array manually one by one(with c code) and then pld this array with neon. Or using vgetq_lane_, vsetq_lane.I think this actions take most of time.  In these case i have no performance win at all.

    How to speed up such functions? What am i doing wrong?


    You're right.
    NEON is not very good to process not sequential datas.

    For the bilinear interpolation, the best is to do it into 2 distinct pass:
    - one on the X axis that will be slow
    - one on the Y axis that will be very pfast.

    I've made the enlarge bilinear interpolation.
    my fistr version was 12 times faster than C version
    http://pulsar.webshaker.net/2011/05/25/bilinear-enlarge-with-neon/

    After taht I succeed to optimize until 16 times
    http://pulsar.webshaker.net/2011/07/15/agrandissement-bilineaire-la-vengeance/
    PS: sorry this times I do not translate it in english. Id you want to make the translation, send me it.

    If you are doin reduction bilinear interpolation, sorry I do not done it yet.
    But I'll do it soon.

    Etienne
Children
No Data