This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M4 (LPC4370): fastest way to sum offset binary samples

Hi to you all,
I'm working on a project involving the LPC Link2 to evaluate its LPC4370 (the one on the board is actually the LPC4370JFET100) for real-time data processing: a more datailed description of my work was given in this question.
What I need to do is:

acquire sample @40MSPS (done)
move them into the central memory using DMA (done)
at a certain threshold crossing caused by the input signal trigger the data processing (done)
process the data as fast as possible TODO

Basically I just need to sum the samples acquired. The ADC packs two 12-bit wide samples in offset binary (due to the fact that the firmware uses thresholds) into one 32bit word.
Thanks to Thibaut ZEISSLOFF's code I was able to extract maximum and minimum very fast: now I'm trying to adapt another of his algorithms (kindly published on his interesting blog m4-unleashed.com).

Here's my code:

Fullscreen

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
__RAMFUNC(RAM) void sum_SMLAD(int32_t* pSrc, uint32_t pSize)
{
    int32_t sum = 0;
    uint32_t pair, loop = pSize >> 2;
    while ((loop-- > 0) && wordsLeft)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 2;
    }
    if (pSize & 0x2)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 1;
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

__RAMFUNC(RAM) void sum_SMLAD(int32_t* pSrc, uint32_t pSize)
{
    int32_t sum = 0;
    uint32_t pair, loop = pSize >> 2;

    while ((loop-- > 0) && wordsLeft)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 2;
    }

    if (pSize & 0x2)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 1;
    }

    if(wordsLeft == 0)
    {
    	peaksCounter -= 1;
    	accumulator[peaksCounter] = sum;
    }
    return;
}

Unfortunately I always run into a Hardfault around the first couple of iteration of the main while loop.
Here's how SMLAD is defined in core_cm4_simd.h:

Fullscreen

1
2
3
4
5
6
7
__attribute__( ( always_inline ) ) __STATIC_INLINE uint32_t __SMLAD (uint32_t op1, uint32_t op2, uint32_t op3)
{
  uint32_t result;
  __ASM volatile ("smlad %0, %1, %2, %3" : "=r" (result) : "r" (op1), "r" (op2), "r" (op3) );
  return(result);
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

__attribute__( ( always_inline ) ) __STATIC_INLINE uint32_t __SMLAD (uint32_t op1, uint32_t op2, uint32_t op3)
{
  uint32_t result;

  __ASM volatile ("smlad %0, %1, %2, %3" : "=r" (result) : "r" (op1), "r" (op2), "r" (op3) );
  return(result);
}

Since the SMLAD intrinsic basically takes 2 words then does (first_top_halfword*second_top_halfword)+(first_bottom_halfword*second_bottom_halfword) im' trying to multiply by one and possibly facing two issues:

I have 12 bit samples not 16 ones (maybe a shift is needed?)
They are in offset binary and I'm a bit confused about multiply by one means in this case.

Thanks for your patience: any help would be highly appreciated!

Regards,

Andrea