Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
Cycles calculation in beagle board
Jump...
Cancel
Locked
Locked
Replies
2 replies
Subscribers
118 subscribers
Views
2871 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Cycles calculation in beagle board
Senthilkumar N L
over 12 years ago
Note: This was originally posted on 22nd November 2010 at
http://forums.arm.com
Hello All,
I am trying to use the cycle counter registers of cortexA8 for calculating cycles.
Following is the code i am using.
int main()
{
int i;
int a,b,c,n;
printf("Enter a: ");
scanf("%d",&a);
printf("Enter b: ");
scanf("%d",&B);
printf("Enter c: ");
scanf("%d",&c);
printf("No of times to run: ");
scanf("%d",&n);
ccnt_init();
ccnt_start();
cycles=ccnt_read();
{
c=a+b;
}
cycles=ccnt_read()-cycles;
ccnt_stop();
printf("Sum : %d\n",c);
printf("Cycles : %d\n",cycles);
}
The above simple integer addition takes 5 cycles since it includes load operations.
If all the above variables are made double it takes 3800 cycles if I enable neon and 1900cycles if i disable neon.
Kindly explain how i am getting these values.
I am using beagle board XM-A3 to run this code and i am using codesourcery 2010q1 toolchain to compile the code.
I am wondering whether it is due to the interrupts. If so how to disable the interrupts.
Thanks in advance..
Parents
Peter Harris
over 12 years ago
Note: This was originally posted on 25th November 2010 at
http://forums.arm.com
You may also want to look at how your floating point is being provided. In many cases for Linux it defaults to a floating-point library (which may be hardware or software) which lis loaded at run-time. A lot of Linux distros default to this shared object implementation, which adds veneer call overheads for every FPU operation. Even with "hard float" you may still find you link a library rather than emitting float instructions directly into the binary - so you may want to dump the image using objdump to make sure you are emitting hard-float directly.
Secondly - the linker resolution of symbols in a shared object is commonly "lazy" for Linux - that is they are resolved the first time they are hit and found to be missing. You may well find you are spending time in your 1900 cycles resolving a link into the shared object via the dynamic linker. The idea of ttfn to do one operation outside of the timing loop before timing should solve this one.
Finally - don't use doubles if you can use floats. For an embedded platform doubles are horrendously expensive - and for most use cases float is fine.
Iso
Cancel
Vote up
0
Vote down
Cancel
Reply
Peter Harris
over 12 years ago
Note: This was originally posted on 25th November 2010 at
http://forums.arm.com
You may also want to look at how your floating point is being provided. In many cases for Linux it defaults to a floating-point library (which may be hardware or software) which lis loaded at run-time. A lot of Linux distros default to this shared object implementation, which adds veneer call overheads for every FPU operation. Even with "hard float" you may still find you link a library rather than emitting float instructions directly into the binary - so you may want to dump the image using objdump to make sure you are emitting hard-float directly.
Secondly - the linker resolution of symbols in a shared object is commonly "lazy" for Linux - that is they are resolved the first time they are hit and found to be missing. You may well find you are spending time in your 1900 cycles resolving a link into the shared object via the dynamic linker. The idea of ttfn to do one operation outside of the timing loop before timing should solve this one.
Finally - don't use doubles if you can use floats. For an embedded platform doubles are horrendously expensive - and for most use cases float is fine.
Iso
Cancel
Vote up
0
Vote down
Cancel
Children
No data