To do this check, I intend to create a database as below:Instruction...... Cycle......Rd........Rm..........RnMul......................2...........E5.........E1...........E1...When my tool read an instruction, it look up into this database to get the available cycle of each register. However, it seems to be a lot of work for me at the moment (Do you have any other ideal? Can you share with me your ideal to implementation!
Thank you very much. Your database will help me save a lot of time. If I find any mistake, I will inform you immediately.I hope your next version will be online soon.
I am confused.From the specs, ADD needs source registers at E2 and destination register is available at E2 too. So this 2 instructions can be dual issued:add r1, r2, r3add r4, r5, r1Because the second ADD requires r1 at E2 and the first ADD makes r1 available at E2 too.If ADD needs source registers at E1, I agree that 2 instructions above can't be dual issued.One explanation, I think, is that ADD needs source registers at the beginning of E2 and make destination register available at the end of E2. However, why doesn't specs say that destination register is available at E3?I know I 'm wrong, but I can't explain.
LDRD r0, r1, [r5]!LDR r3, [r6]!
I have some questions related to this file: http://pulsar.websha...x-A8-cycle.xlsx1. Why does some lines is darker than others? What does it mean? For example: line that contains "MUL::MLA" is darker.2. What does "::" mean? For example, "MUL::MLA" or "MUL::SMLAWB"
I have 2 questions related to multiple load instructions. For example: ldm r1!, {r2, r3}.This instruction takes 2 cycle. So it is broken down to 2 single-cycle operations.1. Because write back is enabled, r1 is written in E2 stage. However, which single-cycle operation is r1 written? The first single-cycle operation or the second single-cycle operation?
2. When are r2, r3 available for other instructions? I can not find any scheduling information in specs.I assume this instruction is similar to LDR, that means, r2, r3 is available in E3. Because r3 is written by the second single-cycle operation, it is available in E3 of the second single-cycle operation.
If my assumption is correct, 2 below instructions should produce 2 cycle penalty:ldm r1, {r2, r3}ldr r4, [r3, #1]However, the result is different in http://pulsar.websha...sult.php?lng=frPlease explain for me.
I am sorry because I am still confused. For example: ldm r1, {r2, r3}Assuming that this instruction starts at the cycle n.If this instruction took only 1 cycle, r2, r3 would be available at the cycle n + 3.However, this instruction takes 2 cycle, so when are r2 and r3 available? (n + 3) or (n + 4)?
Now I understand how hard to find cycle timing for all instructions.
For not found instructions, you treat them as unrecognized, right?
I tried some instructions such as: SETEND, BLKP, SMI, SMC and your cycle count module said unrecognized.
How can I get the lastest version. Is it here: http://pulsar.websha...x-A8-cycle.xlsxI found some instructions are updated. For example, SUBS pc, lr, #imm isn't in "cortex-A8-cycle.xlsx" but it is available in http://pulsar.websha...ult.php?lng=fr.
/^\s*(and|eor|sub|rsb|add|adc|sbc|rcsc|orr|bic)(al|eq|ne|cs|cc|mi|pl|vs|vc|hi|ls|ge|lt|gt|le|lo|hs)?(s)?()(\s+(r\d|r[1][012345]|sb|sl|fp|ip|sp|pc|lr)\s*,\s*(r\d|r[1][012345]|sb|sl|fp|ip|sp|pc|lr)\s*,\s*([^;@,\[\]:]*)\s*)?(?:\s(@.*|\/\/.*))?$/iU
I can't understand why there is too many cases. I guess you calculate how many registers and get how many cycle by the formula in specs.Please explain for me if you can.
How can you treat this situation (my example)?I guess when you know the available stage of a register is E2, you treat as below:- If the register is source, you know it is available at E2- If the register is destination, you know it is available at E3Is my guess right?
HI EtienneI have checked some floating-point instructions as below:VADD,VSUB,VABD,VMUL,VCEQ,VCGE,VCGTVCAGE,VCAGT,VMAX,VMINIn specs, they all require source registers at N2 stage.However, in your database (excel file), they require source registers at N1 stage.Why is this difference?
Hi all,I am doing some profiling analysis on Cortex A8 processor using the Beagle Board-xM. I found a strange behavior with the following piece of code. The code takes 46 cycles. But looking at the code we can see that there is no dependency among each other, so ideally it should have taken only 9 cycles.Code:[indent][indent]/* 46 cycles. */vld1.32 {d16,d17},[r1:128];vmla.f32 d0,d15,d14;vld1.32 {d18,d19},[r1:128];vmla.f32 d1,d15,d14;vld1.32 {d20,d21},[r1:128];vmla.f32 d2,d15,d14;vld1.32 {d22,d23},[r1:128];vmla.f32 d3,d15,d14;vld1.32 {d24,d25},[r1:128];vmla.f32 d4,d15,d14;vld1.32 {d26,d27},[r1:128];vmla.f32 d5,d15,d14;vld1.32 {d28,d29},[r1:128];vmla.f32 d6,d15,d14;vld1.32 {d30,d31},[r1:128];vmla.f32 d7,d15,d14;vld1.32 {d12,d13},[r1:128];vmla.f32 d8,d15,d14;[/indent][/indent]However, if I seperate the vmla and vld then the behavior is as expected, i.e the following codes take 9 and 11 cycles respectively.[indent][indent]/* 9 cycles. */vmla.f32 d0,d15,d14;vmla.f32 d1,d15,d14;vmla.f32 d2,d15,d14;vmla.f32 d3,d15,d14;vmla.f32 d4,d15,d14;vmla.f32 d5,d15,d14;vmla.f32 d6,d15,d14;vmla.f32 d7,d15,d14;vmla.f32 d8,d15,d14;/* 11 cycles. */vld1.32 {d16,d17},[r1:128];vld1.32 {d18,d19},[r1:128];vld1.32 {d20,d21},[r1:128];vld1.32 {d22,d23},[r1:128];vld1.32 {d24,d25},[r1:128];vld1.32 {d26,d27},[r1:128];vld1.32 {d28,d29},[r1:128];vld1.32 {d30,d31},[r1:128];vld1.32 {d12,d13},[r1:128];[/indent][/indent]Can some one please let me know whether I am missing something here or my understanding is wrong.Thanks,Anil M S
add r2, r1, #16 add r3, r2, #16 add r4, r3, #16 b .loop1 .align 4.loop1: vld1.32 {d16,d17},[r1:128] vmul.f32 d0,d15,d14 vld1.32 {d18,d19},[r2:128] vmul.f32 d1,d15,d14 vld1.32 {d20,d21},[r3:128] vmul.f32 d2,d15,d14 vld1.32 {d22,d23},[r4:128] vmul.f32 d3,d15,d14 vld1.32 {d24,d25},[r1:128] vmul.f32 d4,d15,d14 vld1.32 {d26,d27},[r2:128] vmul.f32 d5,d15,d14 vld1.32 {d28,d29},[r3:128] vmul.f32 d6,d15,d14 vld1.32 {d30,d31},[r4:128] vmul.f32 d7,d15,d14 subs r0, r0, #1 bgt .loop1
smlal r0, r1, r3, r4smlal r0, r1, r3, r4smlal takes 3 cycle, destination register is available in E5. So the first instruction releases r0, r1 at the cycle 3 + 5 = 8.
Hi Etienne. Have a good day So far, I found some instructions that your cycle count module can't analyze. I don't know why.Please check and give me some explanations:vbic.i16 d0, #1 ; 0x0001vbic.i32 q2, #1 ; 0x00000001vmov.i16 q0, #1 ; 0x0001vmov.i16 d0, #1 ; 0x0001vmvn.i16 q1, #1 ; 0x0001vmvn.i16 d1, #1 ; 0x0001vorr.i16 q0, #1 ; 0x0001vorr.i16 d0, #1 ; 0x0001Dung