Cortex-A53 direct access to cache: How are instructions encoded in the L1 I-cache?

The TRM for the Cortex-A53 has a section on direct access to various internal memories, including the L1 I-cache and D-caches. I'm successfully able to dump both tag and data for the I-cache and D-cache, but I'm having trouble making sense of the I-cache data encoding. The TRM specifies that bits [19:0] from Data Register 0 and 1 combine, in A32 or A64 state, to form a 40-bit "single pre-decoded instruction". I've successfully polluted the I-cache with long runs of NOPs and other instructions, but I've been unable to properly identify the instruction data read back from the I-cache. I can tell that the same instruction has been cached to nearly every cacheline, but I don't understand the encoding and how to convert those instructions back to the original encoding. Does anyone know of further documentation on this encoding/format? I've run multiple variations of logical (register) operators and I can tell that different nibbles correspond to things such as registers, immediates, and flags, but this is tedious.

Also, if you've read this far, why is that the L1 caches have direct access, but the L2 cache (which is optional) has no such mechanism. I assume that there is a good architectural reason, but I would expect direct access to the L2 before the L1.

Thanks!

More questions in this forum