<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://community.arm.com/utility/feedstylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>How to shuffle bits and Check high bit value using Neon Intrinsics?</title><link>https://community.arm.com/developer/tools-software/tools/f/armds-forum/560/how-to-shuffle-bits-and-check-high-bit-value-using-neon-intrinsics</link><description> Note: This was originally posted on 1st November 2011 at http://forums.arm.com Hi, I am trying to convert a code written in SSE3 intrinsics to NEON SIMD&amp;#160; and am stuck because of a shuffle function.I have looked at the GCC Intrinsic s , ARM manuals but</description><dc:language>en-US</dc:language><generator>Telligent Community 10</generator><item><title>RE: How to shuffle bits and Check high bit value using Neon Intrinsics?</title><link>https://community.arm.com/thread/1284?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:57:18 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:a9018bc5-9f63-498b-b681-2b6e5bf72c09</guid><dc:creator>Gilead Kutnick</dc:creator><description>&lt;div&gt;&lt;i&gt;Note: This was originally posted on 2nd November 2011 at &lt;a href="http://forums.arm.com"&gt;http://forums.arm.com&lt;/a&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;vtbl actually does have a special case for setting the value to zero. The only difference between it and&amp;#160; SSSE3&amp;#39;s pshufb is that it will set the result to zero if any of the out of range bits of the index are set, not just if the most significant bit is.&amp;#160; If you&amp;#39;re using tables of 16 values like pshufb that refers to bits 4 through 7 of the indexes. If for some reason your input has any of bits 4 through 6 you can clear&amp;#160; them before the vtbl by using vand or vbic.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;You do have to use vtbl twice to get both the lower and upper part, if you&amp;#39;re working with 128-bit vectors.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;As for your second question, we need to know more about what you mean by&amp;#160; &amp;quot;look&amp;quot; at the most significant bits. If you want to generate a byte-mask&amp;#160; that&amp;#39;s 0xFF where the MSB is set and 0x00 where it isn&amp;#39;t you can&amp;#160; accomplish it with vclt.s8 #0, vtst.8, or vshr.s8 (I recommend the first&amp;#160; one). If you want to pack the MSBs into an 8-bit mask like pmovmskb&amp;#160; does that&amp;#39;ll take more code. If at all possible it&amp;#39;d be best to change the&amp;#160; algorithm to not need this. But if you must have it you can do it with&amp;#160; the following:&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;- Expand the MSB to to a byte mask using one of the above methods&lt;/span&gt;&lt;br /&gt;&lt;span&gt;- Isolate a different single bit in each byte by ANDing the byte mask against a vector containing { 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 }&lt;/span&gt;&lt;br /&gt;&lt;span&gt;- Combine the bits using a series of three parallel adds (vpadd)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;This works best if you can do it over more than one vector worth of&amp;#160; bytes so the later vpadds have more data to work with, and can hide&amp;#160; latency better.&lt;/span&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: How to shuffle bits and Check high bit value using Neon Intrinsics?</title><link>https://community.arm.com/thread/1283?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:57:18 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:a6c43e7d-2252-4b84-9c08-a753a27f8224</guid><dc:creator>Marcus Harnisch</dc:creator><description>&lt;div&gt;&lt;i&gt;Note: This was originally posted on 2nd November 2011 at &lt;a href="http://forums.arm.com"&gt;http://forums.arm.com&lt;/a&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;You will want to have look at the VTBL/VTBX instructions which seem to do the same thing only for vectors of size 8. Perhaps some of these could be combined to cover larger vectors.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Best of luck&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Marcus&lt;/span&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: How to shuffle bits and Check high bit value using Neon Intrinsics?</title><link>https://community.arm.com/thread/1282?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:57:18 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:360a9b67-d3af-4de4-a914-029903eb1832</guid><dc:creator>Etienne SOBOLE</dc:creator><description>&lt;div&gt;&lt;i&gt;Note: This was originally posted on 2nd November 2011 at &lt;a href="http://forums.arm.com"&gt;http://forums.arm.com&lt;/a&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Oups...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;You said shuffle and not random.&lt;/span&gt;&lt;br /&gt;&lt;span&gt;Sorry. My proposition is wrong in this case !&lt;/span&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: How to shuffle bits and Check high bit value using Neon Intrinsics?</title><link>https://community.arm.com/thread/1281?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:57:17 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:5c95b3fb-a093-40c2-ab2e-95229eedb1ec</guid><dc:creator>Etienne SOBOLE</dc:creator><description>&lt;div&gt;&lt;i&gt;Note: This was originally posted on 1st November 2011 at &lt;a href="http://forums.arm.com"&gt;http://forums.arm.com&lt;/a&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;br /&gt;Is there any equivalent function for the &lt;a href="http://msdn.microsoft.com/en-us/library/yyhs9sh7.aspx" rel="nofollow"&gt;_mm_shuffle_epi8&lt;/a&gt; function in SSE3 .Any suggestions on how to implement this would be really appreciated since I cant seem to get past this.I know that a lookup-table exists ,but it does not do an initial comparison like the _mm_shuffle ,so i am not sure how to implement this.&lt;br /&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;There is no such instruction into NEON, but you can do it by yourself&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;One random algorithm is just&lt;/span&gt;&lt;br /&gt;&lt;span&gt;x[sub]n+1[/sub] = ( 1664525 * x[sub]n[/sub] + 1013904223)&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;&lt;br /&gt;.alea_values:&lt;br /&gt; .word 123456789, 369258147&amp;#160; @ d0 : random value&lt;br /&gt; .word 1664525, 1664525&amp;#160;&amp;#160; @ d1 : random multiplier&lt;br /&gt; .word 1013904223, 1013904223 @ d1 : random increment&lt;br /&gt;...&lt;br /&gt;&lt;br /&gt;@ init&lt;br /&gt;adr&amp;#160;&amp;#160; r0, .alea_values&lt;br /&gt;vld1.u32&amp;#160; {d0 - d2}, [r0] &lt;br /&gt;&lt;br /&gt;@ compute&lt;br /&gt;vmul.u32 d0, d0, d1&lt;br /&gt;vadd.u32 d0, d0, d2&lt;br /&gt;&lt;br /&gt;@ store&lt;br /&gt;vst1.u32&amp;#160; {d0}, [r0] &lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;After computing you musthave a nice 64 bits random value into d0. You can then of course use it as 8, 16, 32 or 64 bits values.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Of course, you don&amp;#39;t have to load and store register at every times.&lt;/span&gt;&lt;br /&gt;&lt;span&gt;If you convert a SSE3 code, then, you must have to many NEON register free. your can use 3 of them to keep the random coef.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;PS: I&amp;#39;ve not test this code. But it should work &lt;/span&gt;&lt;a href="http://forums.arm.com/public/style_emoticons/default/wink.gif"&gt;&lt;img alt=";)" src="http://forums.arm.com/public/style_emoticons/default/wink.gif" /&gt;&lt;/a&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>