This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to efficiently sum 4 x 8bit integers with ARM or NEON

Note: This was originally posted on 17th September 2010 at http://forums.arm.com

Hi,

I am trying to write an ASM function to shrink an 8-bit greyscale image by 4, so I need to get the sum of 4 bytes very quickly. From what I have read, NEON needs atleast 32-bit integers and VFP is for floats, so it looks like I should just stick with ARM (or Thumb-2) instructions.

But I'm just a beginner so I'm wondering if there is a more efficient method of summing 4 consecutive bytes than convert each byte to a 32bit int and then sum them (and then shift right to get the average).

Its for a Cortex-A8 (ARMv7-A), and the data is aligned to 32 bytes or whatever I want.

Cheers,
Shervin Emami
[url="http://www.shervinemami.co.cc/"]http://www.shervinemami.co.cc/[/url]
Parents
  • Note: This was originally posted on 27th September 2010 at http://forums.arm.com

    If you're down to final tweaking, it might be worth experimenting with preloading ahead in the source image.

    Its funny I was thinking of asking you about memory preloading but I thought I had already asked too much of your time as it is :-) From the few message posts I've read about NEON optimisation (I think mainly in the FFmpeg msg boards), they say that memory preloading involves some trial & error to get the right values in the right places?

    I tried aligning in GCC using:
         VLD1.u8 {q0}, [r0:128]!
    but it still gives an error, and I tried every keyboard symbol in place of @ but it still wont work. I'll try using NASM instead.
    Anyway I still don't understand why you aligned some to @128 and some to @64 and some to nothing. Wouldn't it work better if all 8 loads & the store use align (such as @64 on everything if its a 480 pixel wide image or @128 if its a 640 pixel wide image)?

    Thanks a lot for your help! I'm still contemplating whether to attempt a generic image resizing function (from any size to any size) using NEON or whether it would be too difficult to take advantage of SIMD for that type of operation.

    Cheers,
    Shervin Emami.
    [url="http://www.shervinemami.co.cc/"]http://www.shervinemami.co.cc/[/url]
Reply
  • Note: This was originally posted on 27th September 2010 at http://forums.arm.com

    If you're down to final tweaking, it might be worth experimenting with preloading ahead in the source image.

    Its funny I was thinking of asking you about memory preloading but I thought I had already asked too much of your time as it is :-) From the few message posts I've read about NEON optimisation (I think mainly in the FFmpeg msg boards), they say that memory preloading involves some trial & error to get the right values in the right places?

    I tried aligning in GCC using:
         VLD1.u8 {q0}, [r0:128]!
    but it still gives an error, and I tried every keyboard symbol in place of @ but it still wont work. I'll try using NASM instead.
    Anyway I still don't understand why you aligned some to @128 and some to @64 and some to nothing. Wouldn't it work better if all 8 loads & the store use align (such as @64 on everything if its a 480 pixel wide image or @128 if its a 640 pixel wide image)?

    Thanks a lot for your help! I'm still contemplating whether to attempt a generic image resizing function (from any size to any size) using NEON or whether it would be too difficult to take advantage of SIMD for that type of operation.

    Cheers,
    Shervin Emami.
    [url="http://www.shervinemami.co.cc/"]http://www.shervinemami.co.cc/[/url]
Children
No data