This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M4 (LPC4370): fastest way to sum offset binary samples

Hi to you all,
I'm working on a project involving the LPC Link2 to evaluate its LPC4370 (the one on the board is actually the LPC4370JFET100) for real-time data processing: a more datailed description of my work was given in this question.
What I need to do is:

acquire sample @40MSPS (done)
move them into the central memory using DMA (done)
at a certain threshold crossing caused by the input signal trigger the data processing (done)
process the data as fast as possible TODO

Basically I just need to sum the samples acquired. The ADC packs two 12-bit wide samples in offset binary (due to the fact that the firmware uses thresholds) into one 32bit word.
Thanks to Thibaut ZEISSLOFF's code I was able to extract maximum and minimum very fast: now I'm trying to adapt another of his algorithms (kindly published on his interesting blog m4-unleashed.com).

Here's my code:

__RAMFUNC(RAM) void sum_SMLAD(int32_t* pSrc, uint32_t pSize)
{
    int32_t sum = 0;
    uint32_t pair, loop = pSize >> 2;

    while ((loop-- > 0) && wordsLeft)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 2;
    }

    if (pSize & 0x2)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 1;
    }

    if(wordsLeft == 0)
    {
    	peaksCounter -= 1;
    	accumulator[peaksCounter] = sum;
    }
    return;
}

Unfortunately I always run into a Hardfault around the first couple of iteration of the main while loop.
Here's how SMLAD is defined in core_cm4_simd.h:

__attribute__( ( always_inline ) ) __STATIC_INLINE uint32_t __SMLAD (uint32_t op1, uint32_t op2, uint32_t op3)
{
  uint32_t result;

  __ASM volatile ("smlad %0, %1, %2, %3" : "=r" (result) : "r" (op1), "r" (op2), "r" (op3) );
  return(result);
}

Since the SMLAD intrinsic basically takes 2 words then does (first_top_halfword*second_top_halfword)+(first_bottom_halfword*second_bottom_halfword) im' trying to multiply by one and possibly facing two issues:

I have 12 bit samples not 16 ones (maybe a shift is needed?)
They are in offset binary and I'm a bit confused about multiply by one means in this case.

Thanks for your patience: any help would be highly appreciated!

Regards,

Andrea

Parents

+2 Thibaut ZEISSLOFF over 9 years ago in reply to Andrea Bettati

Hi Andrea,

If I understand correctly, you get samples in range [0; 4095] that represent signed value between -2048 and 2047.

If that is the case, you see your problem the following way, writing r[i] raw samples (with offset) and s[i] corresponding signed sample, therefore s[i] = r[i] - 2048.

You can perform the sum operation on raw samples directly and then remove the offset :

raw_sum = r[0] + r[1] + ... + r[N-1] = s[0] + ... + s[N-1] + N*2048

signed_sum = raw_sum - N*2048

Regarding SMLAD usage, I'm not quite sure why you are using 0x80018001 constant, that means that you multiply each raw sample by -32767 (0x8001).

In order to multiply by 1 each raw sample, your constant would need to be 0x00010001 or 0b00000000000000010000000000000001.

Let me know if I did not undertand properly your need !

Thanks for the mention by the way !

Regards,

Thibaut
Cancel
Vote up 0 Vote down

Cancel

Reply

+2 Thibaut ZEISSLOFF over 9 years ago in reply to Andrea Bettati

Hi Andrea,

If I understand correctly, you get samples in range [0; 4095] that represent signed value between -2048 and 2047.

If that is the case, you see your problem the following way, writing r[i] raw samples (with offset) and s[i] corresponding signed sample, therefore s[i] = r[i] - 2048.

You can perform the sum operation on raw samples directly and then remove the offset :

raw_sum = r[0] + r[1] + ... + r[N-1] = s[0] + ... + s[N-1] + N*2048

signed_sum = raw_sum - N*2048

Regarding SMLAD usage, I'm not quite sure why you are using 0x80018001 constant, that means that you multiply each raw sample by -32767 (0x8001).

In order to multiply by 1 each raw sample, your constant would need to be 0x00010001 or 0b00000000000000010000000000000001.

Let me know if I did not undertand properly your need !

Thanks for the mention by the way !

Regards,

Thibaut
Cancel
Vote up 0 Vote down

Cancel

Children

0 Andrea Bettati over 9 years ago in reply to Thibaut ZEISSLOFF

Thanks Thibaut ZEISSLOFF, I got what you mean.
I was trying to mulply for those number just beacause I just shutted down my brain and looked at a wikipedia table:

Anyway, do you suggest to use the SMLAD? I should remove the offset anyway before using SMLAD right?

You're welcome, I had to mention you and your blog. Your work is helping me a lot.
Cancel
Vote up 0 Vote down

Cancel
0 Thibaut ZEISSLOFF over 9 years ago in reply to Andrea Bettati

Yes, I strongly recommend to use SMLAD on raw samples directly and remove offset only on resulting sum. This will give same result with N multiplies+accumulation and 1 subtract instead of N subtracts and N Mac !

You can even initialize your accumulator with -N*offset and then apply N/2 SMLAD on it !!
Cancel
Vote up 0 Vote down

Cancel