This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM PMU - Event 0x56 - No instructions for issue (A8)

Note: This was originally posted on 30th October 2012 at http://forums.arm.com

[font=Arial][size=2]I am using peemuperf (https://github.com/prabindh/peemuperf/blob/master/README.md) to get various cache profile results from ARM PMU, and some specific EMIF counters available in TI processors. I am currently looking at one of the A8 processors. In ARM Performance monitoring unit, the SEL field (event ID) of [/size][/font]EVTSEL register can be used to select the event ID.[size=2]  In this, 0x56 is defined to indicate "[/size]0x56 ==> Increment for every cycle that no instructions are available for issue[size=2]". [/size]

  [font=Arial][size=2] [/size][/font]

  [font=Arial][size=2]With the goal of finding cycles lost because CPU is waiting for I-cache miss and refill to be available, when 0x56 is used, I see some correlation, but with below issues:[/size][/font]

  [font=Arial][size=2] [/size][/font]

  [size=2]-          [/size][font=Arial][size=2]At no-load conditions (no UI or other application, CPU load being negligible), this counter gives cycles equal to that of CPU clock speed. In a 720 MHz processor, I see 720M cycles. This does not match with the description. What is the explanation ?[/size][/font]

  [font=Arial][size=2] [/size][/font]

  [size=2]-          [/size][font=Arial][size=2]At high loads, there is a more meaningful number coming out of the counter, but still it appears to be high, that indicates to me that this cycle count is not just that of CPU waiting for I-cache refills.[/size][/font]

  [font=Arial][size=2] [/size][/font]

  [font=Arial][size=2]Is there a better explanation/ validated results available ?[/size][/font]

  [font=Arial][size=2] [/size][/font]

Parents
  • Note: This was originally posted on 30th October 2012 at http://forums.arm.com

      -          At high loads, there is a more meaningful number coming out of the counter, but still it appears to be high, that indicates to me that this cycle count is not just that of CPU waiting for I-cache refills.


    The wor[size="2"]ding i[size="2"]n the TRM isn't that specific[size="2"].[size="2"] That is, it [size="2"]isn't really clear if it means that [size="2"]the [size="2"]pr[size="2"]ef[size="2"]etch[size="2"] buffer is empty so no instructions could be issued[/size][/size][/size][/size][/size][/size][/size][/size], or if it just means that no instructions are issued. In the latter case you'd get[size="2"] an increment any time you can't issue because [size="2"]an operand isn't ready or because[/size] the needed execution units are in use.
    [/size][/size][/size]
    [size="2"][size="2"][font="Arial"][size="2"][size="2"][size="2"][size="2"]Eve[size="2"]n in the [size="2"]former case it wouldn't only happen while waiting for icache mis[size="2"]ses. [size="2"]The fetch unit works in 64-bit aligned [size="2"]fetches a[size="2"]nd [size="2"]has a one cycle [size="2"]while following a taken branch. So if you [size="2"]branch to or fr[size="2"]om locations that ar[/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size]en't 64-bit aligned you'll [size="2"]only fetch one instruction of two ([size="2"]assuming [size="2"]32-bit instructions) and when you [size="2"]branch you'll lose the opportu[size="2"]nity[size="2"] to fetch two instructions[size="2"]. [size="2"]Very tight code with no stalls can therefore exhaust the [size="2"]fetch capability[size="2"].

    [/size][/size][/size][/size][/size][/size][/size][/size][/size][/size]You can test which it is by looping over a long[size="2"] string of dependent loads[size="2"] (like ldr[size="2"] [size="2"]r0, [[size="2"]r0] over and ov[size="2"]er)[/size][/size][/size][/size] You should only be able to issue one every[/size][/size] three cycles, so two-th[size="2"]irds of the cycles would[size="2"] have nothing issue[size="2"]d, but the fetch [size="2"]buffer will quickly stay full.[/size][/size][/size][/size][/size][/size][/size][/font]
Reply
  • Note: This was originally posted on 30th October 2012 at http://forums.arm.com

      -          At high loads, there is a more meaningful number coming out of the counter, but still it appears to be high, that indicates to me that this cycle count is not just that of CPU waiting for I-cache refills.


    The wor[size="2"]ding i[size="2"]n the TRM isn't that specific[size="2"].[size="2"] That is, it [size="2"]isn't really clear if it means that [size="2"]the [size="2"]pr[size="2"]ef[size="2"]etch[size="2"] buffer is empty so no instructions could be issued[/size][/size][/size][/size][/size][/size][/size][/size], or if it just means that no instructions are issued. In the latter case you'd get[size="2"] an increment any time you can't issue because [size="2"]an operand isn't ready or because[/size] the needed execution units are in use.
    [/size][/size][/size]
    [size="2"][size="2"][font="Arial"][size="2"][size="2"][size="2"][size="2"]Eve[size="2"]n in the [size="2"]former case it wouldn't only happen while waiting for icache mis[size="2"]ses. [size="2"]The fetch unit works in 64-bit aligned [size="2"]fetches a[size="2"]nd [size="2"]has a one cycle [size="2"]while following a taken branch. So if you [size="2"]branch to or fr[size="2"]om locations that ar[/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size][/size]en't 64-bit aligned you'll [size="2"]only fetch one instruction of two ([size="2"]assuming [size="2"]32-bit instructions) and when you [size="2"]branch you'll lose the opportu[size="2"]nity[size="2"] to fetch two instructions[size="2"]. [size="2"]Very tight code with no stalls can therefore exhaust the [size="2"]fetch capability[size="2"].

    [/size][/size][/size][/size][/size][/size][/size][/size][/size][/size]You can test which it is by looping over a long[size="2"] string of dependent loads[size="2"] (like ldr[size="2"] [size="2"]r0, [[size="2"]r0] over and ov[size="2"]er)[/size][/size][/size][/size] You should only be able to issue one every[/size][/size] three cycles, so two-th[size="2"]irds of the cycles would[size="2"] have nothing issue[size="2"]d, but the fetch [size="2"]buffer will quickly stay full.[/size][/size][/size][/size][/size][/size][/size][/font]
Children
No data