This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to understand the behavior of hazard in Cortex-M4?

Hello to all,

I am working on Cortex-M4 and would like to know about the hazard situation. In order to see the effect of Data-Hazard, I have executed few application codes. For example,

LDR R5,[R6,#offset]

ADD R5,R8,R2

LDR R5,[R6,#offset]

ADD R3,R8,R2

Program-1 : Hazard Situation   Program-2 : No Hazard Situation 

Where R6 = SRAM0 address. From both the program it's clearly visible that the Program-1 would have the situation of hazard compared to Program-2. But on the execution of both these, I have observed NO DIFFERENCE in the CURRENT CONSUMPTION as well as the NUMBER of CYCLES. Why is it so?

Also, regarding the consumption of the number of cycles, I didn't find any documentation. From the observation the following table has been made:

Offsets

LDR R5,[R6,#offset]

(Cycle count on execution

of only LDR instruction)

ADD instruction 

(Cycle count on execution

of only ADD instruction)

Number of cycles when a combination of both
0                    1                  1                                     1.5
1                    3                  1                              3
2                    2                  1                             2.5
3                    3                  1                              3

Therefore I have two major queries:

  1. Can anybody help me out with that, why are both the program's behavior is same? 
  2. Are these the correct way of generating the hazard situation in the pipeline? If not can you provide me another example?
  3. The difference in the cycle consumption is really confusing. Can anybody explain to me why such behavior?

Thanking you,

Kind regards,

Himanshu

  • 1. The Cortex-M4 processor is based on a simple 3 stage pipeline : fetch, decode, execute. In the code example you've mentioned, the execution cycle of LDR and ADD does not overlap at all. Even if the ADD instruction doesn't use the data from LDR, the pipeline is still stalled until the load is completed. As a result, you wouldn't see any cycle difference in such code sequences.

    2. Due to the simple nature of the Cortex-M4 pipeline, I don't think we can generate load-use case hazard (load always stall until it is completed).

    However, you might able to generate hazards with MAC operations (e.g. back to back UMLAL - with result in accumulator used as multiply input of the second UMLAL).

    3. Cycle timing is documented in Cortex-M4 Technical Reference Manual : http://infocenter.arm.com/help/topic/com.arm.doc.100166_0001_00_en/ric1417175924567.html

    regards,

    Joseph

  • Dear Mr. Yiu,

    Thank you very much for the information provided by you. I have gone through the link provided by you, which has been listed out the cycle time consumption for all the instruction, while applied individually. But, couldn't get to know how to compute the cycle time for a combination of two or more instructions. From the documentation, I have read about the dependencies of different factors play in the computation of consumption of cycle.

    Still sorry to ask you again, is there any formula being used to calculate for a pair or more instruction sequence (as shown in the original question, where only LDR and ADD instruction are being executed in a pair)  OR it can only be tell based on experiments??

    Very sorry for the trouble.

    Thanking you once again,

    Regards,

    Himanshu