This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cycles calculation in beagle board

Note: This was originally posted on 22nd November 2010 at http://forums.arm.com

Hello All,

I am trying to use the cycle counter registers of cortexA8 for calculating cycles.
Following is the code i am using.

int main()
{
     int i;
     int a,b,c,n;
     printf("Enter a: ");
     scanf("%d",&a);
     printf("Enter b: ");
     scanf("%d",&B);
     printf("Enter c: ");
     scanf("%d",&c);
     printf("No of times to run: ");
     scanf("%d",&n);
    
     ccnt_init();
     ccnt_start();
     cycles=ccnt_read();
     {
             c=a+b;
     }
     cycles=ccnt_read()-cycles;
     ccnt_stop();
     printf("Sum : %d\n",c);
     printf("Cycles : %d\n",cycles);
}

The above simple integer addition takes 5 cycles since it includes load operations.
If all the above variables are made double it takes 3800 cycles if I enable neon and 1900cycles if i disable neon.
Kindly explain how i am getting these values.
I am using beagle board XM-A3 to run this code and i am using codesourcery 2010q1 toolchain to compile the code.
I am wondering whether it is due to the interrupts. If so how to disable the interrupts.

Thanks in advance..
Parents
  • Note: This was originally posted on 22nd November 2010 at http://forums.arm.com

    Could be due to lazy context switch, if you are running under Linux.

    As many (most?) apps aren't built to use the FPU, Linux simply disables the FPU on context switch instead of saving/restoring all the FPU regs.  If the newly switch in app does use the FPU, its first FPU operation will trigger a hardware exception.  This is trapped by the OS, and at that point the FPU is re-enabled and state  saved/restored.  This process means the first op after a context switch takes significantly longer.

    Try doing a dummy calculation using doubles before you timed block.

    Other thing to consider is the length of block you measure.  The A8 has quite a long pipeline, and measure a block which is shorter than the pipeline will hide many of the pipeline benefits.
Reply
  • Note: This was originally posted on 22nd November 2010 at http://forums.arm.com

    Could be due to lazy context switch, if you are running under Linux.

    As many (most?) apps aren't built to use the FPU, Linux simply disables the FPU on context switch instead of saving/restoring all the FPU regs.  If the newly switch in app does use the FPU, its first FPU operation will trigger a hardware exception.  This is trapped by the OS, and at that point the FPU is re-enabled and state  saved/restored.  This process means the first op after a context switch takes significantly longer.

    Try doing a dummy calculation using doubles before you timed block.

    Other thing to consider is the length of block you measure.  The A8 has quite a long pipeline, and measure a block which is shorter than the pipeline will hide many of the pipeline benefits.
Children
No data