Arm Community
Site
Search
User
Site
Search
User
Groups
Education Hub
Arm Ambassadors
Open Source Software and Platforms
Research Collaboration and Enablement
Forums
AI and ML forum
Architectures and Processors forum
Arm Development Platforms forum
Arm Development Studio forum
Arm Virtual Hardware forum
Automotive forum
Compilers and Libraries forum
Graphics, Gaming, and VR forum
High Performance Computing (HPC) forum
Infrastructure Solutions forum
Internet of Things (IoT) forum
Keil forum
Morello forum
Operating Systems forum
SoC Design and Simulation forum
SystemReady Forum
Blogs
AI and ML blog
Announcements
Architectures and Processors blog
Automotive blog
Graphics, Gaming, and VR blog
High Performance Computing (HPC) blog
Infrastructure Solutions blog
Internet of Things (IoT) blog
Operating Systems blog
SoC Design and Simulation blog
Tools, Software and IDEs blog
Support
Arm Support Services
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Support forums
Arm Development Studio forum
bilinear interpolation optimiziation using ARM
Jump...
Cancel
Locked
Locked
Replies
11 replies
Subscribers
120 subscribers
Views
11666 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
bilinear interpolation optimiziation using ARM
Ahmed Tolba
over 11 years ago
Note: This was originally posted on 16th March 2013 at
http://forums.arm.com
I'm working on ARM, and I'm trying to optimize downsampling an image, I have used OpenCV cv::resize and its slow ~3ms for 1280*960 to 400*300, I'm trying to use ARM neon to accelerate it, however I find it hard to get along the ARM neon instructions to rewrite it. I also tried to read few tutorials from arm.blog, and hilberspace.de, but still there are a lot of missing information about the topic, so I would like also someone to recommend a book or something about that topic.
I tried to think about the problem regarding ARM Neon language, so the instructions that load the registers load 8 bytes in one time in a row, but according my algorithm I try to load 4 bytes and interpolate between them.. I couldn't get how to implement that in ARM Neon. It would be a great learning experience
Here is my function:
The code is here :
http://pastebin.com/C2YdeV0M
it seems there is a problem with formatting code here in the forums.
Parents
Gilead Kutnick
over 11 years ago
Note: This was originally posted on 19th March 2013 at
http://forums.arm.com
Converting your bilinear interpolation algorithm directly to NEON is difficult because it has to lookup pixels at arbitrary 2D locations. That's hard for NEON since there's no way to do memory loads to a vector using a vector of indexes, and it can be slow moving the indexes from NEON to ARM registers then doing several loads. You can make the problem better if you split it into two steps that work horizontally and vertically respectively. What you want to do then is take adv
If you only want to do 1280x960 to 400x300 then it's best to hard-code a routine for that. In this case you will have source indexes that are 0, 3.2, 6.4, 9.6 12.8, 16, etc. Since the index fractions repeat every 5 pixels you can loop in groups of 5x5 pixels and hard-code the multiplication coefficients. You can do these multiplications with small fixed point integers instead of floating point which will be faster, especially with NEON. I can give you more information on how I'd do this if this is what you want..
However, are you sure you really want to use a direct bilinear interpolation from 1280x960 to 400x300? The resize ratio of 3.2x3.2 is too high, you can end up with aliasing on high frequency content. I would recommend first scaling from 1280x960 to 640x360 then to 400x300. You can do this directly, it's just another step in the loop.
Cancel
Up
0
Down
Cancel
Reply
Gilead Kutnick
over 11 years ago
Note: This was originally posted on 19th March 2013 at
http://forums.arm.com
Converting your bilinear interpolation algorithm directly to NEON is difficult because it has to lookup pixels at arbitrary 2D locations. That's hard for NEON since there's no way to do memory loads to a vector using a vector of indexes, and it can be slow moving the indexes from NEON to ARM registers then doing several loads. You can make the problem better if you split it into two steps that work horizontally and vertically respectively. What you want to do then is take adv
If you only want to do 1280x960 to 400x300 then it's best to hard-code a routine for that. In this case you will have source indexes that are 0, 3.2, 6.4, 9.6 12.8, 16, etc. Since the index fractions repeat every 5 pixels you can loop in groups of 5x5 pixels and hard-code the multiplication coefficients. You can do these multiplications with small fixed point integers instead of floating point which will be faster, especially with NEON. I can give you more information on how I'd do this if this is what you want..
However, are you sure you really want to use a direct bilinear interpolation from 1280x960 to 400x300? The resize ratio of 3.2x3.2 is too high, you can end up with aliasing on high frequency content. I would recommend first scaling from 1280x960 to 640x360 then to 400x300. You can do this directly, it's just another step in the loop.
Cancel
Up
0
Down
Cancel
Children
No data