Arm Community
Site
Search
User
Site
Search
User
Groups
Education Hub
Open Source Software and Platforms
Research Collaboration and Enablement
Forums
AI and ML forum
Architectures and Processors forum
Arm Development Platforms forum
Arm Development Studio forum
Arm Virtual Hardware forum
Automotive forum
Compilers and Libraries forum
Graphics, Gaming, and VR forum
High Performance Computing (HPC) forum
Infrastructure Solutions forum
Internet of Things (IoT) forum
Keil forum
Morello forum
Operating Systems forum
SoC Design and Simulation forum
SystemReady Forum
Blogs
AI and ML blog
Announcements
Architectures and Processors blog
Automotive blog
Graphics, Gaming, and VR blog
High Performance Computing (HPC) blog
Infrastructure Solutions blog
Internet of Things (IoT) blog
Operating Systems blog
SoC Design and Simulation blog
Tools, Software and IDEs blog
Support
Arm Support Services
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Support forums
Architectures and Processors forum
Cortex A8 Instruction Cycle Timing
Jump...
Cancel
State
Not Answered
Locked
Locked
Replies
90 replies
Subscribers
346 subscribers
Views
79943 views
Users
0 members are here
Cortex-A
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
Cortex A8 Instruction Cycle Timing
barney vardanyan
over 10 years ago
Note: This was originally posted on 17th March 2011 at
http://forums.arm.com
Hi) sorry for bad English
I need to count latency for two instruction, and all I have is the arm cortex A 8 documantation(charter 16) !
but I have no idea how can do this work using that documantation(
Parents
0
Anil M Sripadarao
over 10 years ago
Note: This was originally posted on 2nd August 2011 at
http://forums.arm.com
Hi all,
I am doing some profiling analysis on Cortex A8 processor using the Beagle Board-xM. I found a strange behavior with the following piece of code. The code takes 46 cycles. But looking at the code we can see that there is no dependency among each other, so ideally it should have taken only 9 cycles.
Code:
[indent][indent]
/* 46 cycles. */
vld1.32 {d16,d17},[r1:128];
vmla.f32 d0,d15,d14;
vld1.32 {d18,d19},[r1:128];
vmla.f32 d1,d15,d14;
vld1.32 {d20,d21},[r1:128];
vmla.f32 d2,d15,d14;
vld1.32 {d22,d23},[r1:128];
vmla.f32 d3,d15,d14;
vld1.32 {d24,d25},[r1:128];
vmla.f32 d4,d15,d14;
vld1.32 {d26,d27},[r1:128];
vmla.f32 d5,d15,d14;
vld1.32 {d28,d29},[r1:128];
vmla.f32 d6,d15,d14;
vld1.32 {d30,d31},[r1:128];
vmla.f32 d7,d15,d14;
vld1.32 {d12,d13},[r1:128];
vmla.f32 d8,d15,d14;
[/indent][/indent]However, if I seperate the vmla and vld then the behavior is as expected, i.e the following codes take 9 and 11 cycles respectively.
[indent][indent]/* 9 cycles. */
vmla.f32 d0,d15,d14;
vmla.f32 d1,d15,d14;
vmla.f32 d2,d15,d14;
vmla.f32 d3,d15,d14;
vmla.f32 d4,d15,d14;
vmla.f32 d5,d15,d14;
vmla.f32 d6,d15,d14;
vmla.f32 d7,d15,d14;
vmla.f32 d8,d15,d14;
/* 11 cycles. */
vld1.32 {d16,d17},[r1:128];
vld1.32 {d18,d19},[r1:128];
vld1.32 {d20,d21},[r1:128];
vld1.32 {d22,d23},[r1:128];
vld1.32 {d24,d25},[r1:128];
vld1.32 {d26,d27},[r1:128];
vld1.32 {d28,d29},[r1:128];
vld1.32 {d30,d31},[r1:128];
vld1.32 {d12,d13},[r1:128];
[/indent][/indent]Can some one please let me know whether I am missing something here or my understanding is wrong.
Thanks,
Anil M S
Cancel
Up
0
Down
Cancel
Reply
0
Anil M Sripadarao
over 10 years ago
Note: This was originally posted on 2nd August 2011 at
http://forums.arm.com
Hi all,
I am doing some profiling analysis on Cortex A8 processor using the Beagle Board-xM. I found a strange behavior with the following piece of code. The code takes 46 cycles. But looking at the code we can see that there is no dependency among each other, so ideally it should have taken only 9 cycles.
Code:
[indent][indent]
/* 46 cycles. */
vld1.32 {d16,d17},[r1:128];
vmla.f32 d0,d15,d14;
vld1.32 {d18,d19},[r1:128];
vmla.f32 d1,d15,d14;
vld1.32 {d20,d21},[r1:128];
vmla.f32 d2,d15,d14;
vld1.32 {d22,d23},[r1:128];
vmla.f32 d3,d15,d14;
vld1.32 {d24,d25},[r1:128];
vmla.f32 d4,d15,d14;
vld1.32 {d26,d27},[r1:128];
vmla.f32 d5,d15,d14;
vld1.32 {d28,d29},[r1:128];
vmla.f32 d6,d15,d14;
vld1.32 {d30,d31},[r1:128];
vmla.f32 d7,d15,d14;
vld1.32 {d12,d13},[r1:128];
vmla.f32 d8,d15,d14;
[/indent][/indent]However, if I seperate the vmla and vld then the behavior is as expected, i.e the following codes take 9 and 11 cycles respectively.
[indent][indent]/* 9 cycles. */
vmla.f32 d0,d15,d14;
vmla.f32 d1,d15,d14;
vmla.f32 d2,d15,d14;
vmla.f32 d3,d15,d14;
vmla.f32 d4,d15,d14;
vmla.f32 d5,d15,d14;
vmla.f32 d6,d15,d14;
vmla.f32 d7,d15,d14;
vmla.f32 d8,d15,d14;
/* 11 cycles. */
vld1.32 {d16,d17},[r1:128];
vld1.32 {d18,d19},[r1:128];
vld1.32 {d20,d21},[r1:128];
vld1.32 {d22,d23},[r1:128];
vld1.32 {d24,d25},[r1:128];
vld1.32 {d26,d27},[r1:128];
vld1.32 {d28,d29},[r1:128];
vld1.32 {d30,d31},[r1:128];
vld1.32 {d12,d13},[r1:128];
[/indent][/indent]Can some one please let me know whether I am missing something here or my understanding is wrong.
Thanks,
Anil M S
Cancel
Up
0
Down
Cancel
Children
No data