This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to efficiently sum 4 x 8bit integers with ARM or NEON

Note: This was originally posted on 17th September 2010 at http://forums.arm.com

Hi,

I am trying to write an ASM function to shrink an 8-bit greyscale image by 4, so I need to get the sum of 4 bytes very quickly. From what I have read, NEON needs atleast 32-bit integers and VFP is for floats, so it looks like I should just stick with ARM (or Thumb-2) instructions.

But I'm just a beginner so I'm wondering if there is a more efficient method of summing 4 consecutive bytes than convert each byte to a 32bit int and then sum them (and then shift right to get the average).

Its for a Cortex-A8 (ARMv7-A), and the data is aligned to 32 bytes or whatever I want.

Cheers,
Shervin Emami
[url="http://www.shervinemami.co.cc/"]http://www.shervinemami.co.cc/[/url]
Parents
  • Note: This was originally posted on 22nd September 2010 at http://forums.arm.com

    Wow thanks so much guys, thats exactly what I needed to know!
    Glad to finally be part of the ARM community :-)


    No probs; glad to be of help. And welcome =) A good question to ask too - I love answering assembler hacking questions :)

    I've programmed assembler on a couple of register based architectures (ARM and TI DSPs mainly), and I have to say whenever I look at writing x86 CISC assembler I really get put off by it (mostly I just find register based architectures more intuitive). The more recent versions of the ARM architecture are really nice to write algorithms for; a mixture of ARM DSP and SIMD instructions,some of the newer ARM instructions in ARMv7 such as the wide constant loads, and of course NEON, make it really very flexible and a pleasure to write in =)
Reply
  • Note: This was originally posted on 22nd September 2010 at http://forums.arm.com

    Wow thanks so much guys, thats exactly what I needed to know!
    Glad to finally be part of the ARM community :-)


    No probs; glad to be of help. And welcome =) A good question to ask too - I love answering assembler hacking questions :)

    I've programmed assembler on a couple of register based architectures (ARM and TI DSPs mainly), and I have to say whenever I look at writing x86 CISC assembler I really get put off by it (mostly I just find register based architectures more intuitive). The more recent versions of the ARM architecture are really nice to write algorithms for; a mixture of ARM DSP and SIMD instructions,some of the newer ARM instructions in ARMv7 such as the wide constant loads, and of course NEON, make it really very flexible and a pleasure to write in =)
Children
No data