This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

instruction cycle timings for LDR1, STR1 on cortex-a8

Note: This was originally posted on 13th March 2012 at http://forums.arm.com

Hi,

Iam new to beagle board and cortex-a8. i have written a small piece of code to understand instruction cycle timings of cortex-a8. code is in a loop of 10,000 count. code behaves differentlty with different combinations. following is my code with cycle timings

when i keep only loads, code is taking 10 cycles instead of 6 cycles. In the following case, there are no cache issues, as same memory is used to load the values to 'q' register

VLD1.S32  {rq0},[r11@128] 
VLD1.S32  {rq1},[r11@128]
      
VLD1.S32  {rq3},[r11@128]
VLD1.S32  {rq5},[r11@128]
 
VLD1.S32  {rq6},[r11@128]
VLD1.S32  {rq7},[r11@128]

Below code is taking 13 cycles instead of 6 cycles. difference is above code has loads and this code has stores

VST1.S32  {rq0},[r12@128] 
VST1.S32  {rq1},[r12@128]
      
VST1.S32  {rq3},[r12@128]
VST1.S32  {rq5},[r12@128]
 
VST1.S32  {rq6},[r12@128]
VST1.S32  {rq7},[r12@128]

Combination of loads and stores are working fine. they are taking 12 cycles which is expected. but when i change the register r12 to r11 in store operation, code is taking 32 cycles. why accessing of r11 in loads and stores is giving more cycles.

VLD1.S32  {rq0},[r11@128] 
VLD1.S32  {rq1},[r11@128]
      
VLD1.S32  {rq3},[r11@128]
VLD1.S32  {rq5},[r11@128]
 
VLD1.S32  {rq6},[r11@128]
VLD1.S32  {rq7},[r11@128]

VST1.S32  {rq0},[r12@128] 
VST1.S32  {rq1},[r12@128]
      
VST1.S32  {rq3},[r12@128]
VST1.S32  {rq5},[r12@128]
 
VST1.S32  {rq6},[r12@128]
VST1.S32  {rq7},[r12@128]

Why this is happening. why different combinations are behaving differently. Can anyone please explain.

Thanks in advance,
Chandrakala
0