Arm Community
Site
Search
User
Site
Search
User
Groups
Arm Research
DesignStart
Education Hub
Graphics and Gaming
High Performance Computing
Innovation
Multimedia
Open Source Software and Platforms
Physical
Processors
Security
System
Software Tools
TrustZone for Armv8-M
中文社区
Blog
Announcements
Artificial Intelligence
Automotive
Healthcare
HPC
Infrastructure
Innovation
Internet of Things
Machine Learning
Mobile
Smart Homes
Wearables
Forums
All developer forums
IP Product forums
Tool & Software forums
Support
Open a support case
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Developer Community
Tools and Software
Software Tools
Jump...
Cancel
Software Tools
Arm Development Studio forum
MCPS analysis for ARM9,ARM7 and cortex-A8
Tools, Software and IDEs blog
Forums
Videos & Files
Help
Jump...
Cancel
New
Replies
4 replies
Subscribers
126 subscribers
Views
2054 views
Users
0 members are here
Related
MCPS analysis for ARM9,ARM7 and cortex-A8
Offline
Praveen Kumar
over 7 years ago
Parents
Offline
Peter Harris
over 7 years ago
Note: This was originally posted on 19th October 2012 at
http://forums.arm.com
[color=#222222][font=arial, helvetica, sans-serif][size=2]> Can any one please explain me the reason of this behaviour [/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]
[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]ARM11 is generally faster for compiled code, but it is easy to write assembler for an ARM9 and have it run slower. [/size][/font][/color][color=#222222][font=arial, helvetica, sans-serif][size=2]Without looking at your code it is hard to say exactly why. My guess is that you have some tight loops in your assembler which don't play nicely with the ARM11 branch predictor.[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]
[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2] * Always try and make sure you have two other instructions in between the flag setting operation and the use of the condition in a branch.[/size][/font][/color]
* Don't branch to a branch instruction, the second one will always fail to predict.
[color=#222222][font=arial, helvetica, sans-serif][size=2] * ARM11 has a two cycle load-use penalty, so don't use loaded registers on the next instruction or you will get stalls.[/size][/font][/color]
You also don't say what your benchmarking setup in terms of the three CPU frequencies are. ARM11 will probably be slower than an ARM9 at the same frequency "on average" because the pipeline is longer. However the longer pipeline means it has a significantly higher top clock speed which is where much of the performance comes from. Cortex-A8 is a dual issue machine, so that should be faster at the same frequency.
Cancel
Up
0
Down
Reply
Cancel
Reply
Offline
Peter Harris
over 7 years ago
Note: This was originally posted on 19th October 2012 at
http://forums.arm.com
[color=#222222][font=arial, helvetica, sans-serif][size=2]> Can any one please explain me the reason of this behaviour [/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]
[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]ARM11 is generally faster for compiled code, but it is easy to write assembler for an ARM9 and have it run slower. [/size][/font][/color][color=#222222][font=arial, helvetica, sans-serif][size=2]Without looking at your code it is hard to say exactly why. My guess is that you have some tight loops in your assembler which don't play nicely with the ARM11 branch predictor.[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2]
[/size][/font][/color]
[color=#222222][font=arial, helvetica, sans-serif][size=2] * Always try and make sure you have two other instructions in between the flag setting operation and the use of the condition in a branch.[/size][/font][/color]
* Don't branch to a branch instruction, the second one will always fail to predict.
[color=#222222][font=arial, helvetica, sans-serif][size=2] * ARM11 has a two cycle load-use penalty, so don't use loaded registers on the next instruction or you will get stalls.[/size][/font][/color]
You also don't say what your benchmarking setup in terms of the three CPU frequencies are. ARM11 will probably be slower than an ARM9 at the same frequency "on average" because the pipeline is longer. However the longer pipeline means it has a significantly higher top clock speed which is where much of the performance comes from. Cortex-A8 is a dual issue machine, so that should be faster at the same frequency.
Cancel
Up
0
Down
Reply
Cancel
Children
No data
More questions in this forum
By title
By date
By reply count
By view count
By most asked
By votes
By quality
Descending
Ascending
All recent questions
Unread questions
Questions you've participated in
Questions you've asked
Unanswered questions
Answered questions
Questions with suggested answers
Questions with no replies
Answered
Extended asm alternative for Arm Compiler 5 (memory barriers)
+1
Memory Management Unit (MMU)
Arm Assembly Language (ASM)
Arm Compiler 5
2200
views
1
reply
Latest
1 month ago
by
Ronan Synnott
Answered
Use Arm DS5 streamline performance analyzer on TX2
+1
2989
views
9
replies
Latest
1 month ago
by
ShirB
Answered
Problem with arm_cmplx_mag_f32()
+1
2514
views
2
replies
Latest
2 months ago
by
Vishal_Patel
Answered
Can anyone please help me on how evalution development studio 2020.1 work s and which compiler is needed and how it can be setup?
+1
2246
views
3
replies
Latest
2 months ago
by
Ronan Synnott
Answered
Can anyone tell me the difference between DSTREAM and DSTREAM-ST?
+1
2375
views
2
replies
Latest
2 months ago
by
Xiang
<
>
View all questions in Arm Development Studio forum