outperformance of ETM with recent cortex M cores?

I am using a Cortex M7 @1GHz + M4 @400MHz microcontroller. M4 is currently unused and turned off.
Using Lauterbach uTrace tools I am trying to debug my application.
My application code resides in ITCM, while data is placed in DTCM.

Unfortunately, it seems that the core succeeds in undermining ETM.

By tracing the program I get an enormity of FIFOFULL events, making the tracing useless. I use FreeRTOS and its idle cycle produces an enormous amount of trace packets. I tried lowering the core clock, and it seems it should be no more than 100MHz!!!!
One tenth of the nominal operating frequency!
And again using the network (lwip) returns FIFOFULLs.


Let me try only with data trace only. Attempting to trace writes to a 16byte structure that happen every 125us, again I have FIFOFULL events with 1GHz clock. By limiting to only two words in the structure I no longer have FIFOFULL, but it looks like ETM is failing, because one of the two fields traces it at the right times, but the other traces it about 10~100 times slower. Talking with Lauterbach experts they assumed that the core's event generation rate is much higher than ETM's "event consumption" rate.

Lauterbach experts have identified a number of critical issues that affect the success of a trace:

  • Core speed
  • use of TCM memories
  • use of DMA (hypothesis to be verified)
  • employment of 4-wire bus


Cores such as Cortex M7 and even more so M85, or multicore, have now achieved performance comparable with some Cortex-A or R. Core frequencies have reached GHz for some years now and will probably exceed it soon. For all these considerations, the currently used trace peripheral seems to be very underpowered.

Do you guys have any tricks to make things work well without being forced to slow down the core so much?

Or can you explain why persist in maintaining a clearly undersized trace peripheral?

best regards

Max