This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

How to understand the behavior of hazard in Cortex-M4?

Hello to all,

I am working on Cortex-M4 and would like to know about the hazard situation. In order to see the effect of Data-Hazard, I have executed few application codes. For example,

LDR R5,[R6,#offset]

ADD R5,R8,R2

LDR R5,[R6,#offset]

ADD R3,R8,R2

Program-1 : Hazard Situation

Program-2 : No Hazard Situation

Where R6 = SRAM0 address. From both the program it's clearly visible that the Program-1 would have the situation of hazard compared to Program-2. But on the execution of both these, I have observed NO DIFFERENCE in the CURRENT CONSUMPTION as well as the NUMBER of CYCLES. Why is it so?

Also, regarding the consumption of the number of cycles, I didn't find any documentation. From the observation the following table has been made:

Offsets	*LDR R5,[R6,#offset]* *(Cycle count on execution* *of only LDR instruction)*	ADD instruction *(Cycle count on execution* *of only ADD instruction)*	Number of cycles when a combination of both
0	1	1	1.5
1	3	1	3
2	2	1	2.5
3	3	1	3

Therefore I have two major queries:

Can anybody help me out with that, why are both the program's behavior is same?
Are these the correct way of generating the hazard situation in the pipeline? If not can you provide me another example?
The difference in the cycle consumption is really confusing. Can anybody explain to me why such behavior?

Thanking you,

Kind regards,

Himanshu

Top replies

Joseph Yiu over 8 years ago +1 verified

1. The Cortex-M4 processor is based on a simple 3 stage pipeline : fetch, decode, execute. In the code example you've mentioned, the execution cycle of LDR and ADD does not overlap at all. Even if the...

+1 Joseph Yiu over 8 years ago

1. The Cortex-M4 processor is based on a simple 3 stage pipeline : fetch, decode, execute. In the code example you've mentioned, the execution cycle of LDR and ADD does not overlap at all. Even if the ADD instruction doesn't use the data from LDR, the pipeline is still stalled until the load is completed. As a result, you wouldn't see any cycle difference in such code sequences.

2. Due to the simple nature of the Cortex-M4 pipeline, I don't think we can generate load-use case hazard (load always stall until it is completed).

However, you might able to generate hazards with MAC operations (e.g. back to back UMLAL - with result in accumulator used as multiply input of the second UMLAL).

3. Cycle timing is documented in Cortex-M4 Technical Reference Manual : http://infocenter.arm.com/help/topic/com.arm.doc.100166_0001_00_en/ric1417175924567.html

regards,

Joseph
Cancel
Vote up +1 Vote down

Cancel
0 HimanshuDoshi19 over 8 years ago in reply to Joseph Yiu

Dear Mr. Yiu,

Thank you very much for the information provided by you. I have gone through the link provided by you, which has been listed out the cycle time consumption for all the instruction, while applied individually. But, couldn't get to know how to compute the cycle time for a combination of two or more instructions. From the documentation, I have read about the dependencies of different factors play in the computation of consumption of cycle.

Still sorry to ask you again, is there any formula being used to calculate for a pair or more instruction sequence (as shown in the original question, where only LDR and ADD instruction are being executed in a pair) OR it can only be tell based on experiments??

Very sorry for the trouble.

Thanking you once again,

Regards,

Himanshu
Cancel
Vote up 0 Vote down

Cancel