This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

cycle penality before using a register as pointer

Note: This was originally posted on 28th January 2011 at http://forums.arm.com

Hi.

it seem's that you can't use a modified register as a load address directly in the next cycle (with the Cortex A8)


For example

ADD   r0, r0, #16
LDR   r1,[r0]

will not execute in 2 cycles but in 3 cycles.
I'm looking in the ARM documentation where this penality cycle is explain but I do not find !!!

If I simply use the cortex A8 cycle table:
ADD will write his result in E2
while LDR will need R0 in E1

So If I just apply those rules, the 2 instructions should execute in 2 cycles.

So !!! Does anybody can tell me where this pipeline-dependent latency is explain (or simply notify) ?

Thank's

Parents

Etienne SOBOLE over 12 years ago

Note: This was originally posted on 31st January 2011 at http://forums.arm.com

Well. To bench my code I'm using this loop

.loop: @ bench code smuad   r9, r9, r9 mov   r1, #0 smuad   r10, r10, r10     smuad   r11, r11, r11     smuad   r12, r12, r12 mov   r5, #0 subs   r0, r0, #1 bgt   .loop

This code take 5 cycles. I replace @bench code by the code I want to bench !
With this test protocol,

ldr   r1,[r8] add   r2, r1, r1
take 3 cycles.

mov   r8, r7 ldr   r1, [r8] add   r2, r1, r1
take 4 cycles.

So I conclude that modify a register before using as a pointer take 1 cycle
While there 1 a missing cycles before the add !

I've tried to simulate the 4 functional unit of the cortex but for the moment I do not succeed to obtain real result !
Cancel
Vote up 0 Vote down

Cancel

Reply

Etienne SOBOLE over 12 years ago

Note: This was originally posted on 31st January 2011 at http://forums.arm.com

Well. To bench my code I'm using this loop

.loop: @ bench code smuad   r9, r9, r9 mov   r1, #0 smuad   r10, r10, r10     smuad   r11, r11, r11     smuad   r12, r12, r12 mov   r5, #0 subs   r0, r0, #1 bgt   .loop

This code take 5 cycles. I replace @bench code by the code I want to bench !
With this test protocol,

ldr   r1,[r8] add   r2, r1, r1
take 3 cycles.

mov   r8, r7 ldr   r1, [r8] add   r2, r1, r1
take 4 cycles.

So I conclude that modify a register before using as a pointer take 1 cycle
While there 1 a missing cycles before the add !

I've tried to simulate the 4 functional unit of the cortex but for the moment I do not succeed to obtain real result !
Cancel
Vote up 0 Vote down

Cancel

Children

No data