This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex A8 Instruction Cycle Timing

Note: This was originally posted on 17th March 2011 at http://forums.arm.com

Hi) sorry for bad English

I need to count latency for two instruction, and all I have is the arm cortex A 8 documantation(charter 16) !
but I have no idea how can do this work using that documantation(
  • Note: This was originally posted on 28th April 2011 at http://forums.arm.com


    2 - The ARM start to execute the instruction and lock destination registers (to prevent and other instruction using the same registers as source)
    For example with our previous MUL
    Rd is written to be lockd until cycle #16 (#10 + Rd : E5 + 1 because the mul take 2 cycle, and destination stage are always given for the last cycle of a multicyle instruction)



    Is it right that if the next instruction uses Rd as operand, it has to wait after cycle #16 to start execution? If so, I think it is wasteful because if there no dependency, the next instruction may start execution at cycle #13 or #14.

    Is my thought right?

    Dung!
  • Note: This was originally posted on 28th April 2011 at http://forums.arm.com



    I thought the branches to .else will always be mispredict, but it was not the case.
    It could be very usefull to know the prediction algorithm (but I assume it must be quite secret ;) )!!!


    Why do you think the branch to .else will always be mispredicted?

    As I read from chapter 5: "program flow prediction" of Cortex-A8 Technical Reference Manual, it is always predicted. Have you ever read this chapter?
  • Note: This was originally posted on 28th April 2011 at http://forums.arm.com


    For branch :
    I'm do not know anything about the first stage of the ARM pipeline.
    I don't know what you want to do.


    My purpose, I think, is just simple. I want to develop a tool to count the number of cycles to execute a short source code.

    Therefore, I have studied pipeline of Cortex-A8, branch penalty and the latency of 2 continuous instructions.

    I have found many things that I hardly understand. I hope, together, we can make them clear.


    But, I think that there is no way to know just with a code source if a (conditional) branch will be mispredict or not.


    Yes, I agree with you. To know whether a branch is mispredicted or not, we need to check CPRS, SPSR register, check PC address ...

    I don't have a board or a Cortex-A8, I am just a man of theory :((
  • Note: This was originally posted on 16th May 2011 at http://forums.arm.com

    Webshaker, I found your next version is available at http://pulsar.webshaker.net/ccc/result.php?lng=fr
    However, you have changed format of result, right?
    Could you explain the meaning of  "a.1-0    1c " or something like that?
  • Note: This was originally posted on 25th May 2011 at http://forums.arm.com

    Dear Webshaker,
    I am thinking how to test the cycle count module of Cortex-A8.
    I think, for each instruction, I have to combine it with each other instruction to see how they work together.
    However, I got a problem. Because the number of instructions of Arm is too big, so the number of testcases is big too.

    Do you have other ideal for testing?
  • Note: This was originally posted on 10th May 2011 at http://forums.arm.com


    1 - The ARM check before starting an instruction that all the registers will be available when the instruction will need them.
    For example:
    you want to execute a MUL Rd, Rm, Rs
    Rm must be available at cycle #11 (#10 + 1 see MUL cycle table http://infocenter.ar...ch16s02s03.html)
    If at least 1 register is not avalable, then the ARM do not start the instruction and you have a stall cycle.

    To do this check, I intend to create a database as below:
    Instruction...... Cycle......Rd........Rm..........Rn
    Mul......................2...........E5.........E1...........E1
    ...
    When my tool read an instruction, it look up into this database to get the available cycle of each register. However, it seems to be a lot of work for me at the moment :((
    Do you have any other ideal?
    Can you share with me your ideal to implementation!
  • Note: This was originally posted on 10th May 2011 at http://forums.arm.com


    http://pulsar.websha...x-A8-cycle.xlsx

    Thank you very much. Your database will help me save a lot of time. If I find any mistake, I will inform you immediately.
    I hope your next version will be online soon.
  • Note: This was originally posted on 11th May 2011 at http://forums.arm.com

    I am confused.
    From the specs, ADD needs source registers at E2 and destination register is available at E2 too. So this 2 instructions can be dual issued:
    add r1, r2, r3
    add r4, r5, r1
    Because the second ADD requires r1 at E2 and the first ADD makes r1 available at E2 too.
    If ADD needs source registers at E1, I agree that 2 instructions above can't be dual issued.
    One explanation, I think, is that ADD needs source registers at the beginning of E2 and make destination register available at the end of E2. However, why doesn't specs say that destination register is available at E3?
    I know I 'm wrong, but I can't explain.
  • Note: This was originally posted on 29th April 2011 at http://forums.arm.com


    Buy a beagleboard... http://www.watterott.../BeagleBoard-xM
    This is not very expensive !!!

    Unfortunately, it is big money for me. Also, this board is hard to buy in my country:(
    Webshaker, you said you count the number of cycles when instructions come to "functional unit". By "functional unit", you mean execution stage, don't you?
    Could you explain for me the way you define 2 instructions that can be executed concurrently, in pipe0 and pipe1?
  • Note: This was originally posted on 28th June 2011 at http://forums.arm.com

    I am sorry because I am still confused. For example: ldm r1, {r2, r3}
    Assuming that this instruction starts at  the cycle n.
    If this instruction took only 1 cycle, r2, r3 would be available at the cycle n + 3.
    However, this instruction takes 2 cycle, so when are r2 and r3 available? (n + 3) or (n + 4)?
  • Note: This was originally posted on 26th May 2011 at http://forums.arm.com


    I'll write a post to explain how works the cycle counter and how you can write your own cycle counter in few days (weeks)...
    That will be more more simple that triyng to explain part by part how the program works !!!

    I am looking forward to reading your explanation :))


    But your solution is not a good solution... to much work !!!

    You 're right. I 'll find an other solution to test.
  • Note: This was originally posted on 14th June 2011 at http://forums.arm.com

    Have a look at "cortex-a8-cycle.xls", there are some points that I can't understand clearly:

    1. What is different between "dstCond" column and "cc-dst1" column and "cc-dst2"?
    2. In status register access instruction, MRS with the same type "dst,psr" but there are 2 lines. One line says that MRS takes 8 cycles. Other line says that MRS takes only 1 cycles.Why does this difference happen?
  • Note: This was originally posted on 17th June 2011 at http://forums.arm.com

    Hi Estienne.
    Thank you very much for your explanation.

    for MRS and MSR: there is a lot of instruction that I've not found real cycle timing and I do not have time to test.

    Now I understand how hard to find cycle timing for all instructions.
    For not found instructions, you treat them as unrecognized,  right?
    I tried some instructions such as: SETEND, BLKP, SMI, SMC and your cycle count module said unrecognized.

    Take the last version (but keep the previous one because I've change a lot of things).

    How can I get the lastest version. Is it here: http://pulsar.websha...x-A8-cycle.xlsx
    I found some instructions are updated. For example, SUBS pc, lr, #imm isn't in "cortex-A8-cycle.xlsx" but it is available in http://pulsar.websha...ult.php?lng=fr.

    For example I remove all the STM and LDM rules. There is to many case. Now I build this rules automaticaly in the cycle counter.

    I can't  understand why there is too many cases. I guess you calculate how many registers and get how many cycle by the formula in specs.
    Please explain for me if you can.
  • Note: This was originally posted on 17th June 2011 at http://forums.arm.com

    Hi Etienne
    Thank you very much for sharing

    Ben avison have made a very usefull work for that
    http://www.avison.me.../cortex-a8.html

    I am sorry but I can't access this link.
    I don't know why :((
  • Note: This was originally posted on 11th May 2011 at http://forums.arm.com

    How can you treat this situation (my example)?
    I guess when you know the available stage of a register is E2, you treat as below:
    - If the register is source, you know it is available at  E2
    - If the register is destination, you know it is available at  E3
    Is my guess right?