This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex M4 (LPC4370): fastest way to sum offset binary samples

Hi to you all,
I'm working on a project involving the LPC Link2 to evaluate its LPC4370 (the one on the board is actually the LPC4370JFET100) for real-time data processing: a more datailed description of my work was given in this question.
What I need to do is:

acquire sample @40MSPS (done)
move them into the central memory using DMA (done)
at a certain threshold crossing caused by the input signal trigger the data processing (done)
process the data as fast as possible TODO

Basically I just need to sum the samples acquired. The ADC packs two 12-bit wide samples in offset binary (due to the fact that the firmware uses thresholds) into one 32bit word.
Thanks to Thibaut ZEISSLOFF's code I was able to extract maximum and minimum very fast: now I'm trying to adapt another of his algorithms (kindly published on his interesting blog m4-unleashed.com).

Here's my code:

__RAMFUNC(RAM) void sum_SMLAD(int32_t* pSrc, uint32_t pSize)
{
    int32_t sum = 0;
    uint32_t pair, loop = pSize >> 2;

    while ((loop-- > 0) && wordsLeft)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 2;
    }

    if (pSize & 0x2)
    {
        pair = *__SIMD32(pSrc)++;
        sum = __SMLAD(pair, 0b10000000000000011000000000000001u, sum);
        
        wordsLeft -= 1;
    }

    if(wordsLeft == 0)
    {
    	peaksCounter -= 1;
    	accumulator[peaksCounter] = sum;
    }
    return;
}

Unfortunately I always run into a Hardfault around the first couple of iteration of the main while loop.
Here's how SMLAD is defined in core_cm4_simd.h:

__attribute__( ( always_inline ) ) __STATIC_INLINE uint32_t __SMLAD (uint32_t op1, uint32_t op2, uint32_t op3)
{
  uint32_t result;

  __ASM volatile ("smlad %0, %1, %2, %3" : "=r" (result) : "r" (op1), "r" (op2), "r" (op3) );
  return(result);
}

Since the SMLAD intrinsic basically takes 2 words then does (first_top_halfword*second_top_halfword)+(first_bottom_halfword*second_bottom_halfword) im' trying to multiply by one and possibly facing two issues:

I have 12 bit samples not 16 ones (maybe a shift is needed?)
They are in offset binary and I'm a bit confused about multiply by one means in this case.

Thanks for your patience: any help would be highly appreciated!

Regards,

Andrea

Parents

0 Myy over 7 years ago
Is there a way to provide a good old printf log before each smlad call, in order to check :

if smlad is called before the error

what values were passed when the fault occurs

I see that you're using SIMD32 instructions. Are data well-aligned ? SIMD32 is more likely to cause various problems with unaligned data accesses.

One way to check if it's not the SIMD32 instructions causing the error would be to replace them with simple instructions.
Cancel
Up 0 Down

Cancel

Reply

0 Myy over 7 years ago
Is there a way to provide a good old printf log before each smlad call, in order to check :

if smlad is called before the error

what values were passed when the fault occurs

I see that you're using SIMD32 instructions. Are data well-aligned ? SIMD32 is more likely to cause various problems with unaligned data accesses.

One way to check if it's not the SIMD32 instructions causing the error would be to replace them with simple instructions.
Cancel
Up 0 Down

Cancel

Children

0 Andrea Bettati over 7 years ago in reply to Myy
I Myy (myy), thanks for your reply. Excuse me for being this late in the answer but I checked the whole firmware in these day and found a huge bug. I explained it in this other queston.

Anyway, regarding the problem here discussed, I managed to have no errors: I forgot the

wordsLeft -= 2;

line and once I added it, it "worked" (means no hardfaults).

Yes, the buffers are all well aligned, except the accumulator array. That is declared as follows:
__DATA(RAM4) static uint32_t accumulator[PEAKS_NUM];

While the pSrc buffer is declared the same way, but in the RAM2 section.
Here's my memory layout:

To get reasonable results I think I need to modify the call to SMLAD because of the 12 bit offset binary used by the adchs.
Now I'll try to figure out how to manage those samples in SMLAD, if you have any suggestion feel free to contribute!
Thanks again for your reply by the way.
Cancel
Up 0 Down

Cancel
+2 Thibaut ZEISSLOFF over 7 years ago in reply to Andrea Bettati

Hi Andrea,

If I understand correctly, you get samples in range [0; 4095] that represent signed value between -2048 and 2047.

If that is the case, you see your problem the following way, writing r[i] raw samples (with offset) and s[i] corresponding signed sample, therefore s[i] = r[i] - 2048.

You can perform the sum operation on raw samples directly and then remove the offset :

raw_sum = r[0] + r[1] + ... + r[N-1] = s[0] + ... + s[N-1] + N*2048

signed_sum = raw_sum - N*2048

Regarding SMLAD usage, I'm not quite sure why you are using 0x80018001 constant, that means that you multiply each raw sample by -32767 (0x8001).

In order to multiply by 1 each raw sample, your constant would need to be 0x00010001 or 0b00000000000000010000000000000001.

Let me know if I did not undertand properly your need !

Thanks for the mention by the way !

Regards,

Thibaut
Cancel
Up 0 Down

Cancel
0 Andrea Bettati over 7 years ago in reply to Thibaut ZEISSLOFF

Thanks Thibaut ZEISSLOFF, I got what you mean.
I was trying to mulply for those number just beacause I just shutted down my brain and looked at a wikipedia table:

Anyway, do you suggest to use the SMLAD? I should remove the offset anyway before using SMLAD right?

You're welcome, I had to mention you and your blog. Your work is helping me a lot.
Cancel
Up 0 Down

Cancel
0 Thibaut ZEISSLOFF over 7 years ago in reply to Andrea Bettati

Yes, I strongly recommend to use SMLAD on raw samples directly and remove offset only on resulting sum. This will give same result with N multiplies+accumulation and 1 subtract instead of N subtracts and N Mac !

You can even initialize your accumulator with -N*offset and then apply N/2 SMLAD on it !!
Cancel
Up 0 Down

Cancel