Hi to you all,I've a firmware running on a NXP LPCLink2 (LPC4370: 204 Mhz Cortex M4 MCU) board which basically does this:
My problem is that my code is too slow, and every now and then and overwrite occurs.
Using the DMA I'm saving the ADC data, which I get in Twos complement format (Offset binary is also available), in a uint32_t buffer and try to prepare them for the CMSIS DSP function by converting the buffer into float32_t: here's where the overwrite occurs. It's worth saying that I'm currently using Floating point Software, not hardware.
The CMSIS library also accepts fractional formats like q31_t, q15_t and so on, and since I don't strictly need floating point maths I could even use these formats if that could save me precious time.It feels like I'm missing something important about this step, that's no surprise since this is my first project on a complex MCU, any help/hint/advise would be highly appreciated and would help me in my thesis.
I'll leave here the link for the (more datailed) question I asked in the NXP forums, just in case: LPC4370: ADCHS, GPDMA and CMSIS DSP | NXP Community .
Thanks in advance!
You're right. Now that you have a efficient computation technique, you still can improve the overall efficiency.
Usually, I try to let compiler do his job where he's good !
In fact, you need to wonder what you can do to help him generate efficient code:
I made quite a detailed about this analysis on my blog (Simplest algorithm ever).
In the end :
- try to fix everything you can at compile time (bit shift count, buffer size, loop count ...)
- limit code visibility to what's necessary (using static functions will allow inlining optimizations inside a module), same for variables, do not use module variables (placed in RAM) when only local variables can be used
As demonstrated in my post, this will let you write safe code and allow compiler to get rid of unused parts !
All of this is only true when you need to reach best efficiency and can afford to turn compiler optimizations ON and very high !!
Thank you Thibaut for your detailed answers.I studied and evaluated your code today and I got good results: it took roughly 22.5us for 128 32bit-data, so, correct me if I'm wrong, actually 256 samples! That's the same time it took with the previous implementation and without the bit shifting.
Usually, I try to let compiler do his job where he's good !In fact, you need to wonder what you can do to help him generate efficient code:I made quite a detailed about this analysis on my blog (Simplest algorithm ever).In the end : - try to fix everything you can at compile time (bit shift count, buffer size, loop count ...) - limit code visibility to what's necessary (using static functions will allow inlining optimizations inside a module), same for variables, do not use module variables (placed in RAM) when only local variables can be used
Thanks for this hints, I read the article you linked and now I think I better understand this new (to me of course) way to program your are showing: I feel like I need to study *a lot*. I just wonder how I can get those nice compiler outputs in GCC/LPCXpresso (which is actually a forked version of Eclipse).
Thanks again for your help! Now I'm going to close this post, but it was nice and helpful. Unfortunately I can choose just one correct answer, but I'd like to thank you all (once again) for what you are doing here. Lovely community.