This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to efficiently sum 4 x 8bit integers with ARM or NEON

Note: This was originally posted on 17th September 2010 at http://forums.arm.com

Hi,

I am trying to write an ASM function to shrink an 8-bit greyscale image by 4, so I need to get the sum of 4 bytes very quickly. From what I have read, NEON needs atleast 32-bit integers and VFP is for floats, so it looks like I should just stick with ARM (or Thumb-2) instructions.

But I'm just a beginner so I'm wondering if there is a more efficient method of summing 4 consecutive bytes than convert each byte to a 32bit int and then sum them (and then shift right to get the average).

Its for a Cortex-A8 (ARMv7-A), and the data is aligned to 32 bytes or whatever I want.

Cheers,
Shervin Emami
[url="http://www.shervinemami.co.cc/"]http://www.shervinemami.co.cc/[/url]
  • Note: This was originally posted on 27th September 2010 at http://forums.arm.com

    ... and not have NEON alignment.


    There are some known bugs with the early implementations of alignment annotations in GNU Assembler Syntax, so depending how old your compiler is (4.2 is quite old, so I think suffers from this bug) you may have to bodge your code to use the old syntax.

    In summary - the buggy implementation needed a extra ',' between the register and the alignment.

    @ Buggy form, which works on older GAS assembler
    VLD1.8 {d0}, [r1, :128]

    @ Correct version which works in new GAS assembler (old form still supported though)
    VLD1.8 {d0}, [r1 :128]


    See ...

    [url="http://www.listware.net/201006/gnu-binutils/98009-rfa-arm-fix-neon-alignment-syntax-acceptance.html"]http://www.listware.net/201006/gnu-binutil...acceptance.html[/url]