This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

bilinear interpolation optimiziation using ARM

Note: This was originally posted on 16th March 2013 at http://forums.arm.com

I'm working on ARM, and I'm trying to optimize downsampling an image,  I have used OpenCV cv::resize and its slow ~3ms for 1280*960 to  400*300, I'm trying to use ARM neon to accelerate it,  however I find it hard to get along the ARM neon instructions to rewrite it. I also tried to read few tutorials from arm.blog, and hilberspace.de, but still there are a lot of missing information about the topic, so I would like also someone to recommend a book or something about that topic.
I tried to think about the problem regarding ARM Neon language, so the instructions that load the registers load 8 bytes in one time in a row, but according my algorithm I try to load 4 bytes and interpolate between them.. I couldn't get how  to implement that in ARM Neon. It would be a great learning experience :)
Here is my function:
The code is here : http://pastebin.com/C2YdeV0M
it seems there is a problem with formatting code here in the forums.
Parents
  • Note: This was originally posted on 27th March 2013 at http://forums.arm.com

    You can do bilinear interpolation that uses NEON. You just can't replace every part of your original algorithm with NEON instructions. You can't do something like x = y[z] over a NEON vector register, if z is a variable and not a constant. That would mean doing several independent loads for each lane in the vector. You can only load to a NEON register using a single normal ARM register as an index.

    So if your algorithm needs to do this you have to move all of the lanes to ARM registers and perform several loads to individual lanes in a NEON register, which is going to be slow (especially on Cortex-A8 where moving from NEON to ARM registers has a huge penalty). In the code you posted this is done in lines 18-21.

    If you aren't doing a general bilinear interpolation but are using a fixed ratio then you don't need to index with variables this way. If you must have general bilinear interpolation then the better way to do it is to do horizontal and vertical passes separately.

    It's possible to replace the indexing using a combination of vector reduction and vtbl instructions. But explaining how is a lot of work, I don't want to do that for something you're only curious about.
Reply
  • Note: This was originally posted on 27th March 2013 at http://forums.arm.com

    You can do bilinear interpolation that uses NEON. You just can't replace every part of your original algorithm with NEON instructions. You can't do something like x = y[z] over a NEON vector register, if z is a variable and not a constant. That would mean doing several independent loads for each lane in the vector. You can only load to a NEON register using a single normal ARM register as an index.

    So if your algorithm needs to do this you have to move all of the lanes to ARM registers and perform several loads to individual lanes in a NEON register, which is going to be slow (especially on Cortex-A8 where moving from NEON to ARM registers has a huge penalty). In the code you posted this is done in lines 18-21.

    If you aren't doing a general bilinear interpolation but are using a fixed ratio then you don't need to index with variables this way. If you must have general bilinear interpolation then the better way to do it is to do horizontal and vertical passes separately.

    It's possible to replace the indexing using a combination of vector reduction and vtbl instructions. But explaining how is a lot of work, I don't want to do that for something you're only curious about.
Children
No data