This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Accelerator Coherency Port

Note: This was originally posted on 25th January 2013 at http://forums.arm.com

Hi all,

I'm trying to use the AcceleratorCoherency Port of the ARM A9MPCORE in the Xilinx Zynq platform (http://www.xilinx.co...vices/index.htm).

1.[size="2"]  [/size]I have a functionaldesign where DMA in the FPGA region is able read and write data through the ACP.But is there direct way to verify that the data is coming from the cacheitself. Only option is to measure cache hits using the PL310 cache controllerevent registers againist a known data set size. But it's a not exact solution,as there may be cache hits in the L1 cache hits instead of L2.

2. As mentioned here (http://forums.arm.co...pcore-acp-port/),I downloaded the Ds5 tools to get access to the reference design, but there is nospecific target design for the ACP. The startup code that enables MMU, L1 cachesand SCU should be enough to make sure the ACP is getting the data from cache?

3.  Cacheable region setting can be set in the MMU table. Butdoes it guarantee exclusive access to a fixed memory region. Maybe if a linux osis running, then it can cause cache thrashing. Is there way to set priority forthe region?

4. Is there support for linux for this. As I understand the ACP istechnically a hardware thing and should be transparent to software. Only thing isto do would be to expose the memory region from kernel space to user space togive it to the DMA engine.

Thanks in advance.
  • Note: This was originally posted on 27th January 2013 at http://forums.arm.com

    [color=#222222][font=Arial, sans-serif][size=2]> But is there direct way to verify that the data is coming from the cacheitself[/size][/font][/color]
    [color=#222222][font=Arial, sans-serif][size=2]No.[/size][/font][/color]
    [color=#222222][font=Arial, sans-serif][size=2]> The startup code that enables MMU, L1 caches and SCU should be enough to make sure the ACP is getting the data from cache?[/size][/font][/color]
    [color=#222222][font=Arial, sans-serif][size=2]
    [/size][/font][/color]
    [color=#222222][font=Arial, sans-serif][size=2]The request into the ACP determines coherency properties. See [/size][/font][/color][color=#222222][font=Arial, sans-serif][size=2]http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0407e/CACGGBCF.html[/size][/font][/color]
    [color=#222222][font=Arial, sans-serif][size=2]
    [/size][/font][/color]
    [color=#222222][font=Arial, sans-serif][size=2]> Maybe if a linux osis running, then it can cause cache thrashing. Is there way to set priority for the region?[/size][/font][/color]


    If you have multiple masters hammering the cache then yes you will get thrashing. There is no concept of region priority.


    [color=#222222][font=Arial, sans-serif][size=2]> Is there support for linux for this. As I understand the ACP is technically a hardware thing and should be transparent to software. Only thing is to do would be to expose the memory region from kernel space to user space to give it to the DMA engine.[/size][/font][/color]
    [color=#222222][font=Arial, sans-serif][size=2]
    [/size][/font][/color]
    The ACP itself is transparent, but you are going to need some kernel-side device driver to handle the DMA engine and the memory it uses. ACP makes it easier (and avoid cache maintenance), but there is still a lot of other "driver things" you will need (like VA to PA translation, the ability to share one hardware block over multiple user-space processes, ensuring memory cannot move while being accesses by the DMA, etc).


    HTH,
    Iso
  • Note: This was originally posted on 5th February 2013 at http://forums.arm.com



    >>1.[size="2"]  [/size]I have a functionaldesign where DMA in the FPGA region is able read and write data through the ACP.But is there direct way to verify that the data is coming from the cacheitself. Only option is to >> measure cache hits using the PL310 cache controllerevent registers againist a known data set size. But it's a not exact solution,as there may be cache hits in the L1 cache hits instead of L2.
    [size=2]
    [/size]

    [font=Arial, sans-serif][size=2]You could try performance monitoring unit for L1 cache provided you are able to access programmable registers to enable events monitoring[/size][/font]


         Vaibhav



  • Note: This was originally posted on 5th February 2013 at http://forums.arm.com

    Thanks. I already looked at Performance Monitor Unit (PMU). But the event monitoring unit doesn't seem to be activated. While profiling different user functions, it gives the same counter values. I used the following memory mapped control functions to enable the PMU. I followed the optimization3 example in the ds5 tools. But that uses cp15 register access instead of memory mapped control. Maybe something in the trustzone needs to be enabled for control and enable the PMU.


    void start_pmu(void)
    {
    X_mWriteReg(PMU_BASE,PMUSERENR,0x00000001); //Give User Access
    X_mWriteReg(PMU_BASE,PMCR,0x00000001); // Enable the PMU

    X_mWriteReg(PMU_BASE,PMXEVTYPER0,0x00000004); //Set event0
    X_mWriteReg(PMU_BASE,PMXEVTYPER1,0x00000003); //Set event1

    X_mWriteReg(PMU_BASE,PMCNTENSET,0x80000001); //Enable Counter0
    X_mWriteReg(PMU_BASE,PMCNTENSET,0x80000002); //Enable Counter1

    //X_mWriteReg(PMU_BASE,PMCNTENSET,0x80000000); //CCT Enable

    X_mWriteReg(PMU_BASE,PMCR,0x00000004); //CCT reset
    X_mWriteReg(PMU_BASE,PMCR,0x00000002); //configuration reset

    }

    void stop_pmu(void)
    {
    X_mWriteReg(PMU_BASE,PMCNTENCLR,0x80000000); //CCT reset


    X_mWriteReg(PMU_BASE,PMCNTENCLR,0x80000001); //Disable Counter0
    X_mWriteReg(PMU_BASE,PMCNTENCLR,0x80000002); //Disable Counter1

    u32 value0 = X_mReadReg(PMU_BASE,PMXEVCNTR0);
    u32 value1 = X_mReadReg(PMU_BASE,PMXEVCNTR1);
    u32 value2 = X_mReadReg(PMU_BASE,PMCCNTR);
    printf("Counter0: %d \n", value0);
    printf("Counter1: %d \n", value1);
    printf("Counter: %d \n", value2);

    }

  • Note: This was originally posted on 7th February 2013 at http://forums.arm.com

    Hi all,

    I'm trying to verify the ACP operation byusing Performance Monitor Unit (PMU).  I'mworking on the Xilinx zynq platform with the arm a9 cores.

    The PMU is not enabled by default boot codeprovided by Xilinx.

    I would assume there would be at least 3ways to access this and control it. Memory mapped registers, arm c15 control processorand debug access port(DAP).

    1.[size=2]      [/size]According the Zynq TechnicalReference Manual (http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdfpage 85), its only accessible by actually two ways arm c15 control processor anddebug access port. But surprisingly, the manual provides software controlled registerspace for in page 881. Does this mean it's possible to memory mapped control ofthe PMU? I tried a memory mapped implementation and it didn't work. There couldsomething on trustzone that needs be enabled to control the PMU( which I didn'ttry yet)?

    2.[size=2]      [/size]Second option was using the DAPdebug access port (DAP). According to http://www.xilinx.com/support/documentation/user_guides/ug585-Zynq-7000-TRM.pdfpage 581, if Jtag mode is selected to be in the independent mode, DAP can beaccess through a JTAG pins either through the mio or the emio sides. At leaston the Zynq Zedboard there is no physical port for this. Mio or emio pints needsto forwarded to PMod ports and connected with something like this (http://www.em.avnet.com/en-us/design/drc/Pages/ZedBoard-Processor-Debug-Adapter-.aspx)to get access to the DAP. Is this the only way to access the debug port.

    3.[size=2]      [/size]Alternatively option is thecp15 option. I imported an example assembly code from Arm DS5 tools(Optimization3 example attached here). The problem is that Xilinx arm gcc tool chain from codesourcerytool has a complete different assembler compared to the armgcc. EXPORT, PROC,ENDP all of this are specific to the ARM assembler.  The next step was to modify and inline thereference assembly code in to c file to be compiled, which worked for one functionI tested. Are there any compiler flags that I can enable to use ARM directivesin gnu assembly? And why is this so?

    Only option is thento translate the arm assembly to gnu assembler format. Before that, I wouldlike to confirm there is no direct way to use either memory mapped or the debugaccess port above to control the PMU. Or any other solutions to any of thethree options?

    Thanks in advance.

  • Note: This was originally posted on 8th February 2013 at http://forums.arm.com

    [size=2]I haven't seen Xilinx documentation but I have used the option of using co-processor CP15 for PMU. I am altogether on different tool chains. [/size]


        Vaibhav
  • Note: This was originally posted on 14th February 2013 at http://forums.arm.com

    I got the PMU registers working using inline assembly code for the cp15 registers method.

    I have also enabled ACP coherent read and write requests using[font=Verdana, Tahoma, Arial, Helvetica, sans-serif] this[/font][font=Verdana, Tahoma, Arial, Helvetica, sans-serif] [/font]http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/index.html

    I am able to monitor and verify the exact number of cache read hits for ACP read  (externel DMS reads from the l1 cache) using the ARM PMU event monitors. I get 72 data cache hits. That is 64 + 8. 8 (4*2) is I guess is for X_mWriteReg function calls.


    start_perfmon();
    X_mWriteReg(0x60000000,0x0,0x00005000); //Start DMA

    X_mWriteReg(0x60000000,0x18,0xFFFF8000);  //Src Address

    X_mWriteReg(0x60000000,0x20,0x80000000); //Dst Address

    X_mWriteReg(0x60000000,0x28,0x00000040); //Number of bytes - 64 bytes

        stop_perfmon();

    But for the otherway around is tricky to verify. (ACP writes- Externel DMA writes to the cache). As it looks like a write through policy from above link, it should be writing to the L2 cache all the ACP writes. Therefore P310 L2 Event counters for data writes should be around 64 or something similar. Strangely it shows zero or one. Is it really write through?