This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

bilinear interpolation optimiziation using ARM

Note: This was originally posted on 16th March 2013 at http://forums.arm.com

I'm working on ARM, and I'm trying to optimize downsampling an image,  I have used OpenCV cv::resize and its slow ~3ms for 1280*960 to  400*300, I'm trying to use ARM neon to accelerate it,  however I find it hard to get along the ARM neon instructions to rewrite it. I also tried to read few tutorials from arm.blog, and hilberspace.de, but still there are a lot of missing information about the topic, so I would like also someone to recommend a book or something about that topic.
I tried to think about the problem regarding ARM Neon language, so the instructions that load the registers load 8 bytes in one time in a row, but according my algorithm I try to load 4 bytes and interpolate between them.. I couldn't get how  to implement that in ARM Neon. It would be a great learning experience :)
Here is my function:
The code is here : http://pastebin.com/C2YdeV0M
it seems there is a problem with formatting code here in the forums.
Parents
  • Note: This was originally posted on 19th March 2013 at http://forums.arm.com

    Converting your bilinear interpolation algorithm directly to NEON is difficult because it has to lookup pixels at arbitrary 2D locations. That's hard for NEON since there's no way to do memory loads to a vector using a vector of indexes, and it can be slow moving the indexes from NEON to ARM registers then doing several loads. You can make the problem better if you split it into two steps that work horizontally and vertically respectively. What you want to do then is take adv

    If you only want to do 1280x960 to 400x300 then it's best to hard-code a routine for that. In this case you will have source indexes that are 0, 3.2, 6.4, 9.6 12.8, 16, etc. Since the index fractions repeat every 5 pixels you can loop in groups of 5x5 pixels and hard-code the multiplication coefficients. You can do these multiplications with small fixed point integers instead of floating point which will be faster, especially with NEON. I can give you more information on how I'd do this if this is what you want..

    However, are you sure you really want to use a direct bilinear interpolation from 1280x960 to 400x300? The resize ratio of 3.2x3.2 is too high, you can end up with aliasing on high frequency content. I would recommend first scaling from 1280x960 to 640x360 then to 400x300. You can do this directly, it's just another step in the loop.
Reply
  • Note: This was originally posted on 19th March 2013 at http://forums.arm.com

    Converting your bilinear interpolation algorithm directly to NEON is difficult because it has to lookup pixels at arbitrary 2D locations. That's hard for NEON since there's no way to do memory loads to a vector using a vector of indexes, and it can be slow moving the indexes from NEON to ARM registers then doing several loads. You can make the problem better if you split it into two steps that work horizontally and vertically respectively. What you want to do then is take adv

    If you only want to do 1280x960 to 400x300 then it's best to hard-code a routine for that. In this case you will have source indexes that are 0, 3.2, 6.4, 9.6 12.8, 16, etc. Since the index fractions repeat every 5 pixels you can loop in groups of 5x5 pixels and hard-code the multiplication coefficients. You can do these multiplications with small fixed point integers instead of floating point which will be faster, especially with NEON. I can give you more information on how I'd do this if this is what you want..

    However, are you sure you really want to use a direct bilinear interpolation from 1280x960 to 400x300? The resize ratio of 3.2x3.2 is too high, you can end up with aliasing on high frequency content. I would recommend first scaling from 1280x960 to 640x360 then to 400x300. You can do this directly, it's just another step in the loop.
Children
No data