Arm Community
Site
Search
User
Site
Search
User
Groups
Arm Research
DesignStart
Education Hub
Graphics and Gaming
High Performance Computing
Innovation
Multimedia
Open Source Software and Platforms
Physical
Processors
Security
System
Software Tools
TrustZone for Armv8-M
中文社区
Blog
Announcements
Artificial Intelligence
Automotive
Healthcare
HPC
Infrastructure
Innovation
Internet of Things
Machine Learning
Mobile
Smart Homes
Wearables
Forums
All developer forums
IP Product forums
Tool & Software forums
Pelion IoT Platform
Support
Open a support case
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Developer Community
Tools and Software
Software Tools
Jump...
Cancel
Software Tools
Arm Development Studio forum
Neon reg to ARM reg data transfer
Tools, Software and IDEs blog
Forums
Videos & Files
Help
Jump...
Cancel
New
Replies
5 replies
Subscribers
127 subscribers
Views
2476 views
Users
0 members are here
Related
Neon reg to ARM reg data transfer
Offline
Vishwa Vishwa
over 7 years ago
Note: This was originally posted on 30th July 2009 at http://forums.arm.com
I m transferring data from neon register to arm register, which is very costly.
i.e., it takes each vmov.32 r6,do[2] takes around 13 to 18 cycles.
This is proving to be very costly for a function which runs for many times.
Can anyone please suggest a way out of this???
Thanks in advance for any help..... :)
Parents
Offline
Vishwa Vishwa
over 7 years ago
Note: This was originally posted on 13th August 2009 at
http://forums.arm.com
Hi guys,
I found a way out of this. There were some 16 such instructions(neon to arm transfer instructions) in my function. So accounting for some 208 cycles per call, (latency for vmov.u32 r6,d26[0] = 14).And this function was getting called some 20 thousand times. It was eating up lot of time.
But i had a opportunity in it. Interleaving 14 independent instructions in between.
code was something like this:
vmov.32 r6,d[0]
- instrns which use r6-
- instrn-
- instrn-
- instrn- ........
vmov.32 r6,d[0]
- instrns which use r6-
- instrn-
- instrn-
- instrn- ..........
................etc
So looking at the code structure, an option left is interleave. Take out the instructions which use the immediate result of r6 and insert them just before next neon to arm transfer instrn. This is assuming that the instrns in between are independent. This gave a gain of around 4%.
Thanks a lot for your help!!!!!!!
Cancel
Up
0
Down
Reply
Cancel
Reply
Offline
Vishwa Vishwa
over 7 years ago
Note: This was originally posted on 13th August 2009 at
http://forums.arm.com
Hi guys,
I found a way out of this. There were some 16 such instructions(neon to arm transfer instructions) in my function. So accounting for some 208 cycles per call, (latency for vmov.u32 r6,d26[0] = 14).And this function was getting called some 20 thousand times. It was eating up lot of time.
But i had a opportunity in it. Interleaving 14 independent instructions in between.
code was something like this:
vmov.32 r6,d[0]
- instrns which use r6-
- instrn-
- instrn-
- instrn- ........
vmov.32 r6,d[0]
- instrns which use r6-
- instrn-
- instrn-
- instrn- ..........
................etc
So looking at the code structure, an option left is interleave. Take out the instructions which use the immediate result of r6 and insert them just before next neon to arm transfer instrn. This is assuming that the instrns in between are independent. This gave a gain of around 4%.
Thanks a lot for your help!!!!!!!
Cancel
Up
0
Down
Reply
Cancel
Children
No data
More questions in this forum
By title
By date
By reply count
By view count
By most asked
By votes
By quality
Descending
Ascending
All recent questions
Unread questions
Questions you've participated in
Questions you've asked
Unanswered questions
Answered questions
Questions with suggested answers
Questions with no replies
Not Answered
DS52020.0 connection to Musca-A/B boards not working
0
Arm Development Studio
Musca-A
20
views
0
replies
Started
1 hour ago
by
Daniel Oliveira
Suggested Answer
Positioning a function in a Position Independent Executable for ARMV8
0
1656
views
3
replies
Latest
7 days ago
by
Stephen Theobald
Answered
Link a pure binary file to image with scatter file
0
1619
views
3
replies
Latest
7 days ago
by
Ronan Synnott
Answered
Failed to read contents of Internal RAM L1-I_DATA in ARM DS
0
Arm Development Studio
Cache
Debug and Trace Services Layer (DTSL)
4202
views
23
replies
Latest
20 days ago
by
Boon Khai
Suggested Answer
DS-5 connect fail when cortex-r5 is in lock-step mode
0
3873
views
10
replies
Latest
27 days ago
by
Stuart Hirons
>
View all questions in Arm Development Studio forum