This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
  • Note: This was originally posted on 23rd November 2010 at http://forums.arm.com

    Not pretty, but:

    unsigned int t1,t2,t3,t4;

    t1  = *mn ^ *pq;
    t2  = t1 & 0x01010101;
    t3  = __uhadd8(*mn,*pq)
    t4  = __uhadd8(t2,*co);
    *mn = __uadd8(t3,t4);


    hth
    s.
  • Note: This was originally posted on 24th November 2010 at http://forums.arm.com

    > any such instruction will care the saturation.?

    If you have NEON (for example Cortex-A8)  you can use VQADD.U8, but you'll need to do 8 or 16 elements at a time.
  • Note: This was originally posted on 24th November 2010 at http://forums.arm.com



    *mn = __uhadd8(*mn,__uadd8(*pq,*co));


    but my doubt is how to avoid the overflow [255 + 2];

    any such instruction will care the saturation.?

    From your description it is not quite clear where exactly you want saturation to occur. In the meantime try this:

    void qadd(uint32_t *mnx4, uint32_t *pqx4, uint32_t *cox4)
    {
    uint32_t t1 = __uqadd8(*mnx4, *pqx4);
    uint32_t t2 = __uhadd8(t1, *cox4);

    *mnx4 = t2;
    }

    I don't think your C code example implements what you really want. It might be best if you could tell us what the expected result is supposed to look like.

    Kind regards
    Marcus