The TRM for the Cortex-A53 has a section on direct access to various internal memories, including the L1 I-cache and D-caches. I'm successfully able to dump both tag and data for the I-cache and D-cache, but I'm having trouble making sense of the I-cache data encoding. The TRM specifies that bits [19:0] from Data Register 0 and 1 combine, in A32 or A64 state, to form a 40-bit "single pre-decoded instruction". I've successfully polluted the I-cache with long runs of NOPs and other instructions, but I've been unable to properly identify the instruction data read back from the I-cache. I can tell that the same instruction has been cached to nearly every cacheline, but I don't understand the encoding and how to convert those instructions back to the original encoding. Does anyone know of further documentation on this encoding/format? I've run multiple variations of logical (register) operators and I can tell that different nibbles correspond to things such as registers, immediates, and flags, but this is tedious.
Also, if you've read this far, why is that the L1 caches have direct access, but the L2 cache (which is optional) has no such mechanism. I assume that there is a good architectural reason, but I would expect direct access to the L2 before the L1.
Does anyone know of further documentation on this encoding/format?
I'm not aware of any public documentation for this; IIRC the main use is for debugging manufacturing failures rather than any programmatic use on a release platform.
There is no documentation we can provide on the encoding of the data you're seeing. Peter hit the nail on the head, it is only useful if you're debugging the decode logic itself -- it would be an extremely rare situation that you'd need to confirm a particular instruction decode -- the primary L1 instruction RAMs use case is using the tags to determine whether data is cached or not based on addresses, which you don't need the 'mangled' opcode for.
There's a reasonable micro-architectural explanation for the lack of L2 cache RAMs access on the Cortex-A53, unfortunately ARM cannot discuss micro-architectural details of it's products. It will suffice to say that it is, as you guessed, somewhat down to the fact that the L2 cache is optional.
We're curious as to what your project is, by the way, since you have been building up a line of questioning about internal RAM access on multiple Cortex-A cores
Thanks for the replies, mwsealey and Peter!
I would be curious too, with those questions in my post history. I'm part of a research group at our university focused on SEU sensitivity testing on embedded systems. I'm always looking for new ways to gather diagnostics during our tests, hence the fun questions. I would ask you all for hints on more ways to gather data, but I assume the TRM and programming guides are about all the public documentation I can get.
Thanks for comment on the L2 caches. I've been wondering about that for a while now. I would love to deepen my architectural knowledge someday on all of this.
Dear Alex W,
It's a really long time ago when you guys discussed this interesting topic! :)
May I know more detail about the debug method you used? UART or JTAG? Especially the part of dumping L1 cache!
Also, Do you now get some ideas about gathering data in the L2 cache?
Wish you the best fortune and success!
View all questions in Cortex-A / A-Profile forum