This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Process ADC data, moved by DMA, using CMSIS DSP: what's the right way?

Hi to you all,
I've a firmware running on a NXP LPCLink2 (LPC4370: 204 Mhz Cortex M4 MCU) board which basically does this:

Fills the ADC FIFO @40msps.
Copies the data into memory using the built-in DMA Controller and 2 linked buffers.
Processes one buffer while the other is being filled.

My problem is that my code is too slow, and every now and then and overwrite occurs.

Using the DMA I'm saving the ADC data, which I get in Twos complement format (Offset binary is also available), in a uint32_t buffer and try to prepare them for the CMSIS DSP function by converting the buffer into float32_t: here's where the overwrite occurs. It's worth saying that I'm currently using Floating point Software, not hardware.

The CMSIS library also accepts fractional formats like q31_t, q15_t and so on, and since I don't strictly need floating point maths I could even use these formats if that could save me precious time.
It feels like I'm missing something important about this step, that's no surprise since this is my first project on a complex MCU, any help/hint/advise would be highly appreciated and would help me in my thesis.

I'll leave here the link for the (more datailed) question I asked in the NXP forums, just in case: LPC4370: ADCHS, GPDMA and CMSIS DSP | NXP Community .

Thanks in advance!

Parents

0 Jens Bauer over 7 years ago in reply to G. Goodwin L. Pitos
Building upon goodwin's answer, I'd like to suggest a complete loop:
void Twos2Dec_Remapp(const uint32_t *twosBuff, float32_t *decBuff, uint32_t buffLength) { register uint32_t i; register float *d; register const uint32_t *s; s = (int32_t *) &twosBuff[buffLength]; d = &decBuff[buffLength]; i = -buffLength; /* convert length to negative index for speed */ do { d[i] = (s[i] & 0xfff) - 2048; /* most likely, this is the correct calculation */ } while(++i); }
I believe the above would produce the optimal binary code (by 'hinting' the compiler how).
The most expensive part is to convert to a float! In fact, I think it's a very bad idea using floats if you don't have hardware floats.
Imagine that for each float operation, a huge block of code is executed. Each line of code usually takes 1 or 2 clock cycles (unless we're speaking about dividing), so if you can, use fixed point.
Converting to a 16:16 fixed point is real easy; you just need to change the type 'float *' to 'int32_t *' and the shift operations to ...
d[i] = (s[[i] << 20) >> 4);
-Because fixed point is just an "integer part" and a "fractional part". The integer part is the same as your integer value, the fractional part is 0.
So in hexadecimal, a 16:16 fixed point would lookk like this:
0xiiiipppp
-Also make sure that your destination buffer is not in the SRAM section, which the DMA is using, in order to avoid disturbing the DMA.
The larger the DMA buffer is, the larger the 'propagation delay' will be (eg. time between input and output)
If your data is output in real-time, you will want a small DMA buffer.
But if the DMA buffer is very small, the CPU will spend a lot of time executing non-essential code (such as entering/leaving subroutines, instead of actually working on the data).
-So you'll need to find the right balance and when you've found a size that juuust works, give it a little more room; 40% extra is often a good choice. I would not recommend less than 10% extra.
Cancel
Up 0 Down

Cancel

Reply

0 Jens Bauer over 7 years ago in reply to G. Goodwin L. Pitos
Building upon goodwin's answer, I'd like to suggest a complete loop:
void Twos2Dec_Remapp(const uint32_t *twosBuff, float32_t *decBuff, uint32_t buffLength) { register uint32_t i; register float *d; register const uint32_t *s; s = (int32_t *) &twosBuff[buffLength]; d = &decBuff[buffLength]; i = -buffLength; /* convert length to negative index for speed */ do { d[i] = (s[i] & 0xfff) - 2048; /* most likely, this is the correct calculation */ } while(++i); }
I believe the above would produce the optimal binary code (by 'hinting' the compiler how).
The most expensive part is to convert to a float! In fact, I think it's a very bad idea using floats if you don't have hardware floats.
Imagine that for each float operation, a huge block of code is executed. Each line of code usually takes 1 or 2 clock cycles (unless we're speaking about dividing), so if you can, use fixed point.
Converting to a 16:16 fixed point is real easy; you just need to change the type 'float *' to 'int32_t *' and the shift operations to ...
d[i] = (s[[i] << 20) >> 4);
-Because fixed point is just an "integer part" and a "fractional part". The integer part is the same as your integer value, the fractional part is 0.
So in hexadecimal, a 16:16 fixed point would lookk like this:
0xiiiipppp
-Also make sure that your destination buffer is not in the SRAM section, which the DMA is using, in order to avoid disturbing the DMA.
The larger the DMA buffer is, the larger the 'propagation delay' will be (eg. time between input and output)
If your data is output in real-time, you will want a small DMA buffer.
But if the DMA buffer is very small, the CPU will spend a lot of time executing non-essential code (such as entering/leaving subroutines, instead of actually working on the data).
-So you'll need to find the right balance and when you've found a size that juuust works, give it a little more room; 40% extra is often a good choice. I would not recommend less than 10% extra.
Cancel
Up 0 Down

Cancel

Children

0 Jens Bauer over 7 years ago in reply to Jens Bauer

Update:
From what I can see in your disassembled code, you're using hardware floating point.
This is good for performance.
The instructions ... vmov, vneg.f32, vstr, ... (all those starting with 'v') are floating point instructions.
(unfortunately the code is messed up quite a bit and it seems it's not the full subroutine).
Indeed, the optimization that goodwin mentioned will improve performance dramatically!
The problem with floating point and Interrupt Service Routines (ISR) is that if you use a floating point register in the ISR, it will need to be saved before it's used and restored before the interrupt is ended.
-This is because if you're using floating point registers anywhere in the other parts of your program, the values would be messed up randomly if the registers are changed by the interrupt.
However... If you're using floating point only in the interrupt, you won't have any problems.
[Note: sometimes it would be worth it to use hardware floating point in ISR and software floating point at task-time; eg. if you're only using software floating point for printing out values once per second, that would be the optimal solution. Mixing software and hardware floating point is an advanced topic, however and is not recommended the first 3 days of your life as a programmer].
Cancel
Up 0 Down

Cancel