<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://community.arm.com/utility/feedstylesheets/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Vectorizing Compiler</title><link>https://community.arm.com/developer/tools-software/tools/f/armds-forum/529/vectorizing-compiler</link><description> Note: This was originally posted on 29th June 2010 at http://forums.arm.com Hi, Please see the following tool chain CPP=arm-none-linux-gnueabi-gcc SWS=-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -flax-vector-conversions Target is beegle</description><dc:language>en-US</dc:language><generator>Telligent Community 10</generator><item><title>RE: Vectorizing Compiler</title><link>https://community.arm.com/thread/1145?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:56:35 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:06b50c94-77fc-477d-b072-67e3d5c7c1e7</guid><dc:creator>Scott Douglass</dc:creator><description>&lt;div&gt;&lt;i&gt;Note: This was originally posted on 12th July 2010 at &lt;a href="http://forums.arm.com"&gt;http://forums.arm.com&lt;/a&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;the tool chain&amp;#160; version is given below &lt;br /&gt;&lt;br /&gt;(2007q3-51) 4.2.1&lt;br /&gt;&lt;br /&gt;will i get neon performance by this version of tool chain !&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;I expect that if you are using &amp;#39;-O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp&amp;#39; that 2007q3-51 will try to vectorize.&amp;#160; You can use objdump to find out how well it is doing.&amp;#160; You should probably consider using 2010q1 as it&amp;#39;s 2.5 years newer.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;I have one doubt will my code can enter the cache memory..?&lt;br /&gt;&lt;br /&gt;The OS critical module can use the cache all the time.?&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Your code will share the cache with other processes and the OS.&amp;#160; If the OS and other processes aren&amp;#39;t executing much then your code should stay in the cache (if it fits).&lt;/span&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Vectorizing Compiler</title><link>https://community.arm.com/thread/1144?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:56:34 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:0f0139c8-d466-44cd-b5e1-53042a6ba80d</guid><dc:creator>Dave Mathew</dc:creator><description>&lt;div&gt;&lt;i&gt;Note: This was originally posted on 7th July 2010 at &lt;a href="http://forums.arm.com"&gt;http://forums.arm.com&lt;/a&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Dear scott,&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;the tool chain&amp;#160; version is given below &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;(2007q3-51) 4.2.1&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;will i get neon performance by this version of tool chain !&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;I have one doubt will my code can enter the cache memory..?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;The OS critical module can use the cache all the time.?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Dave&lt;/span&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Vectorizing Compiler</title><link>https://community.arm.com/thread/1143?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:56:34 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:b9f0587f-161c-4c84-965e-a5ccd84e6547</guid><dc:creator>Scott Douglass</dc:creator><description>&lt;div&gt;&lt;i&gt;Note: This was originally posted on 1st July 2010 at &lt;a href="http://forums.arm.com"&gt;http://forums.arm.com&lt;/a&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;&lt;code&gt;[...]&lt;br /&gt;Then in my IMViewer application i take the performence of both versions&lt;br /&gt;the code fragment is given below&lt;br /&gt;&lt;br /&gt;[...]&lt;br /&gt;void main(int argc, char**argv)&lt;br /&gt;{&lt;br /&gt;&amp;#160; gettimeofday(&amp;amp;First, NULL);&lt;br /&gt;&amp;#160; [...]&lt;br /&gt;}&lt;/code&gt;&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;I&amp;#39;d suggest using &amp;#39;times()&amp;#39; or &amp;#39;getrusage(RUSAGE_SELF, ...)&amp;#39; instead of &amp;#39;gettimeofday()&amp;#39; since gettimeofday will be measuing other processes, too, not just yours.&amp;#160; The other functions should be less suseptible to interference from outside sources and give you more consistent numbers.&amp;#160; But it may not make much difference on a quiet system.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;And the pedant in me says, that should be &amp;#39;int main() { ... return 0; }&amp;#39;&amp;#160; -- &amp;#39;void main() { ... }&amp;#39; isn&amp;#39;t really legal.&amp;#160; But that&amp;#39;s not causing any timing difference.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;But sadly the performance for version 2 is not good. It is near to C version. I don&amp;#39;t spot&lt;br /&gt;what is the problem here !&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;Since you&amp;#39;re specifying -O3 for the C version, gcc may be doing vectoriztion.&amp;#160; You can add -ftree-vectorizer-verbose=2 and look for &amp;#39;LOOP VECTORIZED&amp;#39; in gcc&amp;#39;s messages.&amp;#160; Or you can &amp;#39;arm-...-objdump -d&amp;#39; the .o file (or even the executable?) and look for the vector instructions.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Following doubts still exists&lt;br /&gt;&lt;br /&gt;1. Will i can configure the L1 and L2 cache size of OS kernel?&lt;/blockquote&gt;&lt;br /&gt;&lt;span&gt;No, the kernel should enable and deal with the caches -- that&amp;#39;s part of it&amp;#39;s job.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;2. Is there any hand&amp;#160; written assembly is needed for enable the Neon processor of beegle board&lt;/blockquote&gt;&lt;br /&gt;&lt;span&gt;That&amp;#39;s also the kernel&amp;#39;s job.&amp;#160; If you executed a NEON instruction with a kernel that had NEON disabled, I&amp;#39;d expect your process to killed by SIGILL.&amp;#160; &amp;#39;uname -a&amp;#39; will tell us the kernel version number.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;3. My gcc version is Red Hat 3.4.4-2&lt;/blockquote&gt;&lt;br /&gt;&lt;span&gt;That looks like the host compiler.&amp;#160; I should have said &amp;#39;arm-none-linux-gnueabi-gcc --version&amp;#39;&lt;/span&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Vectorizing Compiler</title><link>https://community.arm.com/thread/1142?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:56:34 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:28fc807a-f773-471a-9a1b-a674a45a42a2</guid><dc:creator>Dave Mathew</dc:creator><description>&lt;div&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Vectorizing Compiler</title><link>https://community.arm.com/thread/1141?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:56:34 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:bf053e54-cd5a-4bd7-9e86-e10df70f9039</guid><dc:creator>Scott Douglass</dc:creator><description>&lt;div&gt;&lt;i&gt;Note: This was originally posted on 30th June 2010 at &lt;a href="http://forums.arm.com"&gt;http://forums.arm.com&lt;/a&gt;&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;How can i disable the vectorization.&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;I can&amp;#39;t tell what version of gcc you are using from the information above (gcc --version), but in recent versions, using &amp;#39;-O3&amp;#39; implies &amp;#39;-ftree-vectorize&amp;#39;.&amp;#160; Are you using &amp;#39;-O3&amp;#39;?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;If you want to disable vectorization then you probably want to use &amp;#39;-fno-tree-vectorize&amp;#39;.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span&gt;I&amp;#39;m curious:&amp;#160; why do you want to disable vectorization?&lt;/span&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Vectorizing Compiler</title><link>https://community.arm.com/thread/1140?ContentTypeID=1</link><pubDate>Wed, 11 Sep 2013 10:56:34 GMT</pubDate><guid isPermaLink="false">dd9e70c8-6d3c-4c71-b136-2456382a7b5c:9aa4c6d8-595e-48e6-943a-4f920500ec68</guid><dc:creator>suvir bhargav</dc:creator><description>&lt;div&gt;&lt;/div&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>