This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M4 (LPC4370): fastest way to sum offset binary samples

Hi to you all,
I'm working on a project involving the LPC Link2 to evaluate its LPC4370 (the one on the board is actually the LPC4370JFET100) for real-time data processing: a more datailed description of my work was given in this question.
What I need to do is:

  • acquire sample @40MSPS (done)
  • move them into the central memory using DMA (done)
  • at a certain threshold crossing caused by the input signal trigger the data processing (done)
  • process the data as fast as possible TODO

Basically I just need to sum the samples acquired. The ADC packs two 12-bit wide samples in offset binary (due to the fact that the firmware uses thresholds) into one 32bit word.
Thanks to 's code I was able to extract maximum and minimum very fast: now I'm trying to adapt another of his algorithms (kindly published on his interesting blog m4-unleashed.com).

Here's my code:

__RAMFUNC(RAM) void sum_SMLAD(int32_t* pSrc, uint32_t pSize)
{
    int32_t sum = 0;
    uint32_t pair, loop = pSize >> 2;

    while ((loop-- > 0) && wordsLeft)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 2;
    }

    if (pSize & 0x2)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 1;
    }

    if(wordsLeft == 0)
    {
    	peaksCounter -= 1;
    	accumulator[peaksCounter] = sum;
    }
    return;
}

Unfortunately I always run into a Hardfault around the first couple of iteration of the main while loop.
Here's how SMLAD is defined in core_cm4_simd.h:

__attribute__( ( always_inline ) ) __STATIC_INLINE uint32_t __SMLAD (uint32_t op1, uint32_t op2, uint32_t op3)
{
  uint32_t result;

  __ASM volatile ("smlad %0, %1, %2, %3" : "=r" (result) : "r" (op1), "r" (op2), "r" (op3) );
  return(result);
}


Since the SMLAD intrinsic basically takes 2 words then does (first_top_halfword*second_top_halfword)+(first_bottom_halfword*second_bottom_halfword) im' trying to multiply by one and possibly facing two issues:

  1.  I have 12 bit samples not 16 ones (maybe a shift is needed?)
  2. They are in offset binary and I'm a bit confused about multiply by one means in this case.

Thanks for your patience: any help would be highly appreciated! 

Regards,

Andrea

Parents
  • Is there a way to provide a good old printf log before each smlad call, in order to check :

    • if smlad is called before the error
    • what values were passed when the fault occurs

    I see that you're using SIMD32 instructions. Are data well-aligned ? SIMD32 is more likely to cause various problems with unaligned data accesses.

    One way to check if it's not the SIMD32 instructions causing the error would be to replace them with simple instructions.

Reply
  • Is there a way to provide a good old printf log before each smlad call, in order to check :

    • if smlad is called before the error
    • what values were passed when the fault occurs

    I see that you're using SIMD32 instructions. Are data well-aligned ? SIMD32 is more likely to cause various problems with unaligned data accesses.

    One way to check if it's not the SIMD32 instructions causing the error would be to replace them with simple instructions.

Children