This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cycle Measurement for CortexA8 on Beagle Board

Note: This was originally posted on 12th May 2010 at http://forums.arm.com

Hi all,
      I'm working on CortexA8 Hardware(Beagle Board). I want to measure the cycles that it takes for my function using the cycle count register that is present in the CP15 coprocessor register.  I made total memory into 1MB descriptors, a total of 4096 and set the permissions for each descriptor and enabled the MMU and cache. I'm trying to get the cycles that it takes for my function. But when I'm trying to do this I got more cycles than what I expect.

My function is
 
void main()
{
      int count1, count2;

      //Enabled cycle counter here
     
      count1 = read_ccount();    // That returns the cycles count value from the performance monitor register
    
      func();
     
      count2 = read_ccount();    // That returns the cycles count value from the performance monitor register

      TotalCycles = count2 - count1;
}

   .global func

func:
     MOV     r4, #7
     MOV     r5, #2
     MOV     r6, #9
     MOV     r2, #60000
    
nextt1:
     MUL     r7, r5, r4
     MUL     r4, r6, r6
     MUL     r9, r5, r6
     MUL     r7, r6, r5
     SUBS    r2, r2, #1
     MUL     r8, r8, r4
     MUL     r9, r9, r5
     BNE     nextt1    

     MOV     r0, r8
    
    bx lr
    .end
      
        The cycles that I got using cycle count register is 960745. But Manually, if I calculate it should take around 900000 as
 
  6 Mul's * 2 cycles each * 60,000 loop count      = 720000
  1 sub   * 1 cycle each  * 60,000 loop count       =   60000
  1 branch * 2 cycles each * 60,000 loop  count  = 120000
                                                  -------------------------------------
                                                 Total                   = 900000(appr)

                       Is the cycles which I'm getting using cycle counter is correct or  is I'm missing any thing in calculation. Can any one help in this.
                         
Thanks in advance,

with regards,
Raghavendra.M
  • Note: This was originally posted on 17th May 2010 at http://forums.arm.com

    Cycle count tables in the TRM often hide some of the detail for sake of clarity. You have almost the right answer (only off by one cycle per loop - the extra 745 cycles is probably just down to cache misses and the initial overhead of calling the function and reading the performance counter).

    It's pretty rare to get cycle counts on a modern core which are exactly right because in reality the hardware isn't as simple as the cycle timing tables in the manual make out.
  • Note: This was originally posted on 17th May 2010 at http://forums.arm.com

    I was wondering how you got 2 cycles for 1 branch, since the Cortex A8 technical reference manual on p. 658 (Section 16-12) states that a branch only takes 1 cycle. If that's the case, the total cycle count should be even less of course, so I'm afraid I can't help with that part.
  • Note: This was originally posted on 13th January 2011 at http://forums.arm.com

    Hi.

    I've tested your code.
    It take 12 cycles for a loop on my beagleboard-XM.

    It seems logical in fact.
    the SUB and the BNE can be dual with MUL instruction !!!

    I don't know how you can find 16 cycles !!!
  • Hi,

    I am interested in your topic. When I try your code, I get compiling errors.

    1. There is an error for '     MOV     r2, #60000'. It says "invalid constant (ea60) after fixup

    0xea60=60000. It seems that 6000 is out of range.

    2. For

    MUL r8, r8, r4
    MUL r9, r9, r5

    it says that Rd, Rm should be different in mul.

    I use Linaro toolchain. What compiling tools your use? Why have I such errors?

    Thanks,