Arm Community
Site
Search
User
Site
Search
User
Support forums
Arm Development Studio forum
MCPS analysis for ARM9,ARM7 and cortex-A8
Jump...
Cancel
Locked
Locked
Replies
4 replies
Subscribers
119 subscribers
Views
3543 views
Users
0 members are here
Options
Share
More actions
Cancel
Related
How was your experience today?
This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion
MCPS analysis for ARM9,ARM7 and cortex-A8
Praveen Kumar
over 12 years ago
Parents
Peter Harris
over 12 years ago
Note: This was originally posted on 19th October 2012 at
http://forums.arm.com
[color=#222222][font=arial, helvetica, sans-serif][size=2]> Can any one please explain me the reason of this behaviour [/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]
[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]ARM11 is generally faster for compiled code, but it is easy to write assembler for an ARM9 and have it run slower. [/size][/font][/color][color=#222222][font=arial, helvetica, sans-serif][size=2]Without looking at your code it is hard to say exactly why. My guess is that you have some tight loops in your assembler which don't play nicely with the ARM11 branch predictor.[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]
[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2] * Always try and make sure you have two other instructions in between the flag setting operation and the use of the condition in a branch.[/size][/font][/color]
* Don't branch to a branch instruction, the second one will always fail to predict.
[color=#222222][font=arial, helvetica, sans-serif][size=2] * ARM11 has a two cycle load-use penalty, so don't use loaded registers on the next instruction or you will get stalls.[/size][/font][/color]
You also don't say what your benchmarking setup in terms of the three CPU frequencies are. ARM11 will probably be slower than an ARM9 at the same frequency "on average" because the pipeline is longer. However the longer pipeline means it has a significantly higher top clock speed which is where much of the performance comes from. Cortex-A8 is a dual issue machine, so that should be faster at the same frequency.
Cancel
Vote up
0
Vote down
Cancel
Reply
Peter Harris
over 12 years ago
Note: This was originally posted on 19th October 2012 at
http://forums.arm.com
[color=#222222][font=arial, helvetica, sans-serif][size=2]> Can any one please explain me the reason of this behaviour [/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]
[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]ARM11 is generally faster for compiled code, but it is easy to write assembler for an ARM9 and have it run slower. [/size][/font][/color][color=#222222][font=arial, helvetica, sans-serif][size=2]Without looking at your code it is hard to say exactly why. My guess is that you have some tight loops in your assembler which don't play nicely with the ARM11 branch predictor.[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]
[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2] * Always try and make sure you have two other instructions in between the flag setting operation and the use of the condition in a branch.[/size][/font][/color]
* Don't branch to a branch instruction, the second one will always fail to predict.
[color=#222222][font=arial, helvetica, sans-serif][size=2] * ARM11 has a two cycle load-use penalty, so don't use loaded registers on the next instruction or you will get stalls.[/size][/font][/color]
You also don't say what your benchmarking setup in terms of the three CPU frequencies are. ARM11 will probably be slower than an ARM9 at the same frequency "on average" because the pipeline is longer. However the longer pipeline means it has a significantly higher top clock speed which is where much of the performance comes from. Cortex-A8 is a dual issue machine, so that should be faster at the same frequency.
Cancel
Vote up
0
Vote down
Cancel
Children
No data