Arm Community
Site
Search
User
Site
Search
User
Groups
Arm Research
DesignStart
Education Hub
Graphics and Gaming
High Performance Computing
Innovation
Multimedia
Open Source Software and Platforms
Physical
Processors
Security
System
Software Tools
TrustZone for Armv8-M
中文社区
Blog
Announcements
Artificial Intelligence
Automotive
Healthcare
HPC
Infrastructure
Innovation
Internet of Things
Machine Learning
Mobile
Smart Homes
Wearables
Forums
All developer forums
IP Product forums
Tool & Software forums
Pelion IoT Platform
Support
Open a support case
Documentation
Downloads
Training
Arm Approved program
Arm Design Reviews
Community Help
More
Cancel
Developer Community
Tools and Software
Software Tools
Jump...
Cancel
Software Tools
Arm Development Studio forum
How to shuffle bits and Check high bit value using Neon Intrinsics?
Tools, Software and IDEs blog
Forums
Videos & Files
Help
Jump...
Cancel
New
Replies
4 replies
Subscribers
127 subscribers
Views
3702 views
Users
0 members are here
Related
How to shuffle bits and Check high bit value using Neon Intrinsics?
Offline
Rahul Budhiraja
over 7 years ago
Note: This was originally posted on 1st November 2011 at http://forums.arm.com
Hi,
I am trying to convert a code written in SSE3 intrinsics to NEON SIMD and am stuck because of a shuffle function.I have looked at the
GCC Intrinsic
s ,
ARM manuals
but have not been able to find a solution
Is there any equivalent function for the
_mm_shuffle_epi8
function in SSE3 .Any suggestions on how to implement this would be really appreciated since I cant seem to get past this.I know that a lookup-table exists ,but it does not do an initial comparison like the _mm_shuffle ,so i am not sure how to implement this.
Also,I need to check only the high bit values of each byte in a register.Is there any way to check the high-bit value of each element in a vector ?I looked at the manual and was not able to find anything relevant.Any help/info would be sincerely appreciated.
Cheers,
Parents
Offline
Gilead Kutnick
over 7 years ago
Note: This was originally posted on 2nd November 2011 at
http://forums.arm.com
vtbl actually does have a special case for setting the value to zero. The only difference between it and SSSE3's pshufb is that it will set the result to zero if any of the out of range bits of the index are set, not just if the most significant bit is. If you're using tables of 16 values like pshufb that refers to bits 4 through 7 of the indexes. If for some reason your input has any of bits 4 through 6 you can clear them before the vtbl by using vand or vbic.
You do have to use vtbl twice to get both the lower and upper part, if you're working with 128-bit vectors.
As for your second question, we need to know more about what you mean by "look" at the most significant bits. If you want to generate a byte-mask that's 0xFF where the MSB is set and 0x00 where it isn't you can accomplish it with vclt.s8 #0, vtst.8, or vshr.s8 (I recommend the first one). If you want to pack the MSBs into an 8-bit mask like pmovmskb does that'll take more code. If at all possible it'd be best to change the algorithm to not need this. But if you must have it you can do it with the following:
- Expand the MSB to to a byte mask using one of the above methods
- Isolate a different single bit in each byte by ANDing the byte mask against a vector containing { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 }
- Combine the bits using a series of three parallel adds (vpadd)
This works best if you can do it over more than one vector worth of bytes so the later vpadds have more data to work with, and can hide latency better.
Cancel
Up
0
Down
Reply
Cancel
Reply
Offline
Gilead Kutnick
over 7 years ago
Note: This was originally posted on 2nd November 2011 at
http://forums.arm.com
vtbl actually does have a special case for setting the value to zero. The only difference between it and SSSE3's pshufb is that it will set the result to zero if any of the out of range bits of the index are set, not just if the most significant bit is. If you're using tables of 16 values like pshufb that refers to bits 4 through 7 of the indexes. If for some reason your input has any of bits 4 through 6 you can clear them before the vtbl by using vand or vbic.
You do have to use vtbl twice to get both the lower and upper part, if you're working with 128-bit vectors.
As for your second question, we need to know more about what you mean by "look" at the most significant bits. If you want to generate a byte-mask that's 0xFF where the MSB is set and 0x00 where it isn't you can accomplish it with vclt.s8 #0, vtst.8, or vshr.s8 (I recommend the first one). If you want to pack the MSBs into an 8-bit mask like pmovmskb does that'll take more code. If at all possible it'd be best to change the algorithm to not need this. But if you must have it you can do it with the following:
- Expand the MSB to to a byte mask using one of the above methods
- Isolate a different single bit in each byte by ANDing the byte mask against a vector containing { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 }
- Combine the bits using a series of three parallel adds (vpadd)
This works best if you can do it over more than one vector worth of bytes so the later vpadds have more data to work with, and can hide latency better.
Cancel
Up
0
Down
Reply
Cancel
Children
No data
More questions in this forum
By title
By date
By reply count
By view count
By most asked
By votes
By quality
Descending
Ascending
All recent questions
Unread questions
Questions you've participated in
Questions you've asked
Unanswered questions
Answered questions
Questions with suggested answers
Questions with no replies
Not Answered
Forum FAQs
0
ARM Community
904
views
0
replies
Started
5 days ago
by
Annie Cracknell
Suggested Answer
How to view SFRs in DS during debugging?
0
248
views
1
reply
Latest
1 day ago
by
Ronan Synnott
Answered
Dual-core debugging in DS
0
3171
views
2
replies
Latest
14 days ago
by
Ivan Savvateev
Answered
Failure to get an evaluation license with error Unable to execute API call /api/v1/connect
0
4106
views
3
replies
Latest
19 days ago
by
Tim Holt
Suggested Answer
DS52020.0 connection to Musca-A/B boards not working
0
Arm Development Studio
Musca-A
5145
views
4
replies
Latest
21 days ago
by
Daniel Oliveira
>
View all questions in Arm Development Studio forum