Arm Community
Site
Search
User
Site
Search
User
Groups
Education Hub
Arm Ambassadors
Open Source Software and Platforms
Research Collaboration and Enablement
Forums
AI and ML forum
Architectures and Processors forum
Arm Development Platforms forum
Arm Development Studio forum
Arm Virtual Hardware forum
Automotive forum
Compilers and Libraries forum
Graphics, Gaming, and VR forum
High Performance Computing (HPC) forum
Infrastructure Solutions forum
Internet of Things (IoT) forum
Keil forum
Morello forum
Operating Systems forum
SoC Design and Simulation forum
SystemReady Forum
Blogs
AI and ML blog
Announcements
Architectures and Processors blog
Automotive blog
Graphics, Gaming, and VR blog
High Performance Computing (HPC) blog
Infrastructure Solutions blog
Internet of Things (IoT) blog
Operating Systems blog
SoC Design and Simulation blog
Tools, Software and IDEs blog
Support
Arm Support Services
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Support forums
Arm Development Studio forum
bilinear interpolation optimiziation using ARM
Jump...
Cancel
Locked
Locked
Replies
11 replies
Subscribers
120 subscribers
Views
11663 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
bilinear interpolation optimiziation using ARM
Ahmed Tolba
over 11 years ago
Note: This was originally posted on 16th March 2013 at
http://forums.arm.com
I'm working on ARM, and I'm trying to optimize downsampling an image, I have used OpenCV cv::resize and its slow ~3ms for 1280*960 to 400*300, I'm trying to use ARM neon to accelerate it, however I find it hard to get along the ARM neon instructions to rewrite it. I also tried to read few tutorials from arm.blog, and hilberspace.de, but still there are a lot of missing information about the topic, so I would like also someone to recommend a book or something about that topic.
I tried to think about the problem regarding ARM Neon language, so the instructions that load the registers load 8 bytes in one time in a row, but according my algorithm I try to load 4 bytes and interpolate between them.. I couldn't get how to implement that in ARM Neon. It would be a great learning experience
Here is my function:
The code is here :
http://pastebin.com/C2YdeV0M
it seems there is a problem with formatting code here in the forums.
Parents
Gilead Kutnick
over 11 years ago
Note: This was originally posted on 27th March 2013 at
http://forums.arm.com
You can do bilinear interpolation that uses NEON. You just can't replace every part of your original algorithm with NEON instructions. You can't do something like x = y[z] over a NEON vector register, if z is a variable and not a constant. That would mean doing several independent loads for each lane in the vector. You can only load to a NEON register using a single normal ARM register as an index.
So if your algorithm needs to do this you have to move all of the lanes to ARM registers and perform several loads to individual lanes in a NEON register, which is going to be slow (especially on Cortex-A8 where moving from NEON to ARM registers has a huge penalty). In the code you posted this is done in lines 18-21.
If you aren't doing a general bilinear interpolation but are using a fixed ratio then you don't need to index with variables this way. If you must have general bilinear interpolation then the better way to do it is to do horizontal and vertical passes separately.
It's possible to replace the indexing using a combination of vector reduction and vtbl instructions. But explaining how is a lot of work, I don't want to do that for something you're only curious about.
Cancel
Up
0
Down
Cancel
Reply
Gilead Kutnick
over 11 years ago
Note: This was originally posted on 27th March 2013 at
http://forums.arm.com
You can do bilinear interpolation that uses NEON. You just can't replace every part of your original algorithm with NEON instructions. You can't do something like x = y[z] over a NEON vector register, if z is a variable and not a constant. That would mean doing several independent loads for each lane in the vector. You can only load to a NEON register using a single normal ARM register as an index.
So if your algorithm needs to do this you have to move all of the lanes to ARM registers and perform several loads to individual lanes in a NEON register, which is going to be slow (especially on Cortex-A8 where moving from NEON to ARM registers has a huge penalty). In the code you posted this is done in lines 18-21.
If you aren't doing a general bilinear interpolation but are using a fixed ratio then you don't need to index with variables this way. If you must have general bilinear interpolation then the better way to do it is to do horizontal and vertical passes separately.
It's possible to replace the indexing using a combination of vector reduction and vtbl instructions. But explaining how is a lot of work, I don't want to do that for something you're only curious about.
Cancel
Up
0
Down
Cancel
Children
No data