Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
bilinear interpolation optimiziation using ARM
Jump...
Cancel
Locked
Locked
Replies
11 replies
Subscribers
119 subscribers
Views
12534 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
bilinear interpolation optimiziation using ARM
Ahmed Tolba
over 12 years ago
Note: This was originally posted on 16th March 2013 at
http://forums.arm.com
I'm working on ARM, and I'm trying to optimize downsampling an image, I have used OpenCV cv::resize and its slow ~3ms for 1280*960 to 400*300, I'm trying to use ARM neon to accelerate it, however I find it hard to get along the ARM neon instructions to rewrite it. I also tried to read few tutorials from arm.blog, and hilberspace.de, but still there are a lot of missing information about the topic, so I would like also someone to recommend a book or something about that topic.
I tried to think about the problem regarding ARM Neon language, so the instructions that load the registers load 8 bytes in one time in a row, but according my algorithm I try to load 4 bytes and interpolate between them.. I couldn't get how to implement that in ARM Neon. It would be a great learning experience
Here is my function:
The code is here :
http://pastebin.com/C2YdeV0M
it seems there is a problem with formatting code here in the forums.
Parents
Peter Harris
over 12 years ago
Note: This was originally posted on 16th March 2013 at
http://forums.arm.com
It's a vector unit, so vectorize
You don't try and do one pixel of output at a time, you do lots in parallel. This means you can use the full width of the SIMD hardware, and that's where the performance comes from.
Cancel
Vote up
0
Vote down
Cancel
Reply
Peter Harris
over 12 years ago
Note: This was originally posted on 16th March 2013 at
http://forums.arm.com
It's a vector unit, so vectorize
You don't try and do one pixel of output at a time, you do lots in parallel. This means you can use the full width of the SIMD hardware, and that's where the performance comes from.
Cancel
Vote up
0
Vote down
Cancel
Children
No data