Turning on MMU and caches on Cortex-A7?

In my little program (rpi_stub) it's time to turn on MMU and caches.

Most of it I seem to have hold of, except cache invalidations.

In multicore situation (rpi_doesn't support yet, but maybe later), what needs to be invalidated and how?

I understand I-cache, D-cache, L2-cache, branch predictor, I-TLB, D-TLB, but what else?

And to which depth? I understand that the I-cache can't be invalidated further than PoU - but is it enough?

There is an invalidation command for D-cache line upto PoC. (I understand D-cache has to be invalidated line-by-line.)

And do they all need to be invalidated separately, or does, say, D-cache invalidation invalidate L2 too?

Is there a common TLB-invalidation?

And where to find the info in the documents? There are some "implementation dependent" stuff (like bits) in the architecture manual, but Cortex-A7 MPCore TRM doesn't mention anything about some stuff.

(An example: bit 9 of 1st level translation table entries.)

In the architecture manual, the list of cache maintenance operations give links to operation description, and that description repeats the (rough) function description and gives a link back to the list.

Where to find the contents of the register to read/write from/to the CP15?

Also, is there any example code around?