Arm Community
Site
Search
User
Site
Search
User
Groups
Arm Research
DesignStart
Education Hub
Graphics and Gaming
High Performance Computing
Innovation
Multimedia
Open Source Software and Platforms
Physical
Processors
Security
System
Software Tools
TrustZone for Armv8-M
中文社区
Blog
Announcements
Artificial Intelligence
Automotive
Healthcare
HPC
Infrastructure
Innovation
Internet of Things
Machine Learning
Mobile
Smart Homes
Wearables
Forums
All developer forums
IP Product forums
Tool & Software forums
Pelion IoT Platform
Support
Open a support case
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Developer Community
Tools and Software
Software Tools
Jump...
Cancel
Software Tools
Arm Development Studio forum
Cycles calculation in beagle board
Tools, Software and IDEs blog
Forums
Videos & Files
Help
Jump...
Cancel
New
Replies
2 replies
Subscribers
127 subscribers
Views
1576 views
Users
0 members are here
Related
Cycles calculation in beagle board
Offline
Senthilkumar N L
over 7 years ago
Note: This was originally posted on 22nd November 2010 at http://forums.arm.com
Hello All,
I am trying to use the cycle counter registers of cortexA8 for calculating cycles.
Following is the code i am using.
int main()
{
int i;
int a,b,c,n;
printf("Enter a: ");
scanf("%d",&a);
printf("Enter b: ");
scanf("%d",&B);
printf("Enter c: ");
scanf("%d",&c);
printf("No of times to run: ");
scanf("%d",&n);
ccnt_init();
ccnt_start();
cycles=ccnt_read();
{
c=a+b;
}
cycles=ccnt_read()-cycles;
ccnt_stop();
printf("Sum : %d\n",c);
printf("Cycles : %d\n",cycles);
}
The above simple integer addition takes 5 cycles since it includes load operations.
If all the above variables are made double it takes 3800 cycles if I enable neon and 1900cycles if i disable neon.
Kindly explain how i am getting these values.
I am using beagle board XM-A3 to run this code and i am using codesourcery 2010q1 toolchain to compile the code.
I am wondering whether it is due to the interrupts. If so how to disable the interrupts.
Thanks in advance..
Offline
Martin Weidmann
over 7 years ago
Note: This was originally posted on 22nd November 2010 at
http://forums.arm.com
Could be due to lazy context switch, if you are running under Linux.
As many (most?) apps aren't built to use the FPU, Linux simply disables the FPU on context switch instead of saving/restoring all the FPU regs. If the newly switch in app does use the FPU, its first FPU operation will trigger a hardware exception. This is trapped by the OS, and at that point the FPU is re-enabled and state saved/restored. This process means the first op after a context switch takes significantly longer.
Try doing a dummy calculation using doubles before you timed block.
Other thing to consider is the length of block you measure. The A8 has quite a long pipeline, and measure a block which is shorter than the pipeline will hide many of the pipeline benefits.
Cancel
Up
0
Down
Reply
Cancel
Offline
Peter Harris
over 7 years ago
Note: This was originally posted on 25th November 2010 at
http://forums.arm.com
You may also want to look at how your floating point is being provided. In many cases for Linux it defaults to a floating-point library (which may be hardware or software) which lis loaded at run-time. A lot of Linux distros default to this shared object implementation, which adds veneer call overheads for every FPU operation. Even with "hard float" you may still find you link a library rather than emitting float instructions directly into the binary - so you may want to dump the image using objdump to make sure you are emitting hard-float directly.
Secondly - the linker resolution of symbols in a shared object is commonly "lazy" for Linux - that is they are resolved the first time they are hit and found to be missing. You may well find you are spending time in your 1900 cycles resolving a link into the shared object via the dynamic linker. The idea of ttfn to do one operation outside of the timing loop before timing should solve this one.
Finally - don't use doubles if you can use floats. For an embedded platform doubles are horrendously expensive - and for most use cases float is fine.
Iso
Cancel
Up
0
Down
Reply
Cancel
More questions in this forum
By title
By date
By reply count
By view count
By most asked
By votes
By quality
Descending
Ascending
All recent questions
Unread questions
Questions you've participated in
Questions you've asked
Unanswered questions
Answered questions
Questions with suggested answers
Questions with no replies
Suggested Answer
Debugging kernel: OS support not working for Linux 5.4
0
Kernel Developers
External Hardware Debug
Debugger
7070
views
5
replies
Latest
2 months ago
by
sgoldschmidt
Suggested Answer
DS-5 bare metal wait error after run "debug"
0
DS-5 Development Studio
Debugging
Arm Compiler 5
Memory
29181
views
14
replies
Latest
2 months ago
by
prasadghole
Suggested Answer
ARM development studio with ARM Juno r2 board
0
Juno Arm Development Platform
Arm Development Studio
Products
Arm Support
6573
views
2
replies
Latest
2 months ago
by
Ronan Synnott
Answered
"Unable to execute remote query (response code 503) " issue
0
6300
views
1
reply
Latest
2 months ago
by
Ronan Synnott
Not Answered
Where can I download DS-5 hardware firmware??
0
5821
views
1
reply
Latest
2 months ago
by
Ronan Synnott
<
>
View all questions in Arm Development Studio forum