Hi there,
I am the author of the open source Orbuculum tools for SWO data parsing on CORTEX-M targets. I am currently expanding those tools by implementing 1, 2 & 4-bit parallel TRACEDATA capture from CORTEX-M3/M4 CPUs using a small FPGA connected to the TRACE port running in continuous mode. There is plenty of documentation about parsing the data once it's reached the host machine but very little about the precise details of the format of the data bits out of the port.
By trial and error I think I am now collecting these DDR data correctly but I want to verify/validate what I have done. I am not certain of the exact detail of the re-sync (7F FF FF FF) and pacing (7F FF) messages, nor for the potential for bit-shifting of data in relation to the port (e.g. is b0 always on TRACEDATA0...from experimentation it would appear to have the potential to be shifted an arbitrary number of bits?). Is there any documentation or, even better, Verilog examples, of collecting the data from this port? I haven't found anything so far, but there's a _lot_ of documentation in the world of Arm, so I could easily have missed something.
Thanks in advance
DAVE
Hi Sean,
Thanks for the speedy reply. Indeed the 'phase' does seem to remain constant once the communication is established (although I constantly check for syncs, so if I changed I would re-adjust automatically). Given the limitations of the FPGA I'm using I can accept that it's constrained to the 4-bit port for the time being :-) I suspect the phase might change depending on the low power handling of a particular CPU, but for now the only way I've been able to get it to change is by power cycling the target.
I do see 7f ff within the data stream as 'pace' data both inter- and intra- frame. These I filter at the lowest level which makes the data rate up to the fifo much more managable...but I think your reply implies one other thing that I hadn't realised - I had assumed that multiple 16-octet frames could be strung together with no intervening sync (7f ff ff ff) but is that not the case and there will _always_ be a sync before a frame? If that is indeed the case then I can improve the robustness of the flow a little by enforcing that requirement.
It all seems to be working quite well at the moment, and it makes me smile to have these kinds of capabilities on a $22 dev board, but I'm just trying to make sure the assumptions I've made are valid before I start making the stuff too openly available to others.
Regards
Your assumption is correct. The 16 byte frames _can_ run back to back, but this depends on the timing of input data to the TPIU. After byte ~12 has been output, one more data byte is sufficient to close the frame (depending on the current ID state). and I think there is a 'rounding up' which prefers a full frame (since we may be forced to use ID 0 for the NULL source). If we're at byte 12 and get more than 3 more bytes of input, the subsequent frame starts with no delay (so best case protocol overhead is 1/16). Typical trace packets are single byte, so I'd expect roughly 50/50 chance of seeing back to back frames, or maybe less if the clock ratio is 1:1. In some cases, only a multi-byte packet in the right place would trigger this.
Sounds like you're about to make trace much more accessible to developers, which is nice.
Sean,
Thanks for this. Very helpful indeed. When the thing is solid enough for primetime I'll post back here so anyone following can pick it up. In the meantime the development status is available at https://github.com/mubes/orbuculum .... that's the link to the generic decode package (which also supports SWO over various physical interfaces), and the trace stuff is in the orbtrace directory under there. It's changing pretty much daily at the moment though....no laughing at the Verilog, I'm an embedded guy, not a PL one.