This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Coprocessor interface information

Note: This was originally posted on 29th July 2013 at http://forums.arm.com

Hi

I'm continuing to struggle with the problem of manually invalidating the caches in my A9 based Zynq 7000 architecture...

My difficulty stems from a problem in using the on-board DMA-330 controller, as the Xilinx architecture seems to leave me having to invalidate & flush all the data caches manually, which is rather time-consuming with the ARM cache controllers (having only line-by-line access rather than range based instructions). I'm trying to get an estimate of the peak performance I might achieve on DMA given that the bulk of the time seems to be lost in cache invalidation and flushing.

Isogen74 was generous enough to answer my question about the need for a data sync barrier between sequential accesses to  CP15 but I'm still struggling to get good estimates of likely performance, perhaps because of my own scruffy thinking.

My guess is that the A9 main core is much faster than the coprocessor so, as dsb isn't needed between sequential accesses, there must be some other mechanism for the coprocessor to hold-off the core which must impact the performance. I could really do with understanding this interaction but have failed to track down a document with this so far.


For instance, if there's a fifo queueing the data to the coprocessor, then the depth of this will impact the number of cache lines that can be invalidated in a burst before stalling the core; if the coprocessor runs at (perhaps?) a 16:1 cycle ratio then I can start estimating the maximum burst size and inter-burst spacing I'll need without impacting on other threads.


Does anyone know where I might find some description of the coprocessor architecture and core arbitration mechanisms?


Cheers


Joe.
0