This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Vectorizing Compiler

Note: This was originally posted on 29th June 2010 at http://forums.arm.com

Hi,

Please see the following tool chain

CPP=arm-none-linux-gnueabi-gcc
SWS=-march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=softfp -flax-vector-conversions
Target is beegle board

How can i disable the vectorization.
If i give the above tool chain, it will create a default vectorized code for the given C source
if i write the NEON C intrinsics then will the compiler overrides its optimization and use the programmer neon direction.

Please help me to solve the doubts
  • Note: This was originally posted on 30th June 2010 at http://forums.arm.com

    How can i disable the vectorization.


    I can't tell what version of gcc you are using from the information above (gcc --version), but in recent versions, using '-O3' implies '-ftree-vectorize'.  Are you using '-O3'?

    If you want to disable vectorization then you probably want to use '-fno-tree-vectorize'.

    I'm curious:  why do you want to disable vectorization?
  • Note: This was originally posted on 1st July 2010 at http://forums.arm.com

    [...]
    Then in my IMViewer application i take the performence of both versions
    the code fragment is given below

    [...]
    void main(int argc, char**argv)
    {
      gettimeofday(&First, NULL);
      [...]
    }


    I'd suggest using 'times()' or 'getrusage(RUSAGE_SELF, ...)' instead of 'gettimeofday()' since gettimeofday will be measuing other processes, too, not just yours.  The other functions should be less suseptible to interference from outside sources and give you more consistent numbers.  But it may not make much difference on a quiet system.

    And the pedant in me says, that should be 'int main() { ... return 0; }'  -- 'void main() { ... }' isn't really legal.  But that's not causing any timing difference.

    But sadly the performance for version 2 is not good. It is near to C version. I don't spot
    what is the problem here !


    Since you're specifying -O3 for the C version, gcc may be doing vectoriztion.  You can add -ftree-vectorizer-verbose=2 and look for 'LOOP VECTORIZED' in gcc's messages.  Or you can 'arm-...-objdump -d' the .o file (or even the executable?) and look for the vector instructions.


    Following doubts still exists

    1. Will i can configure the L1 and L2 cache size of OS kernel?

    No, the kernel should enable and deal with the caches -- that's part of it's job.

    2. Is there any hand  written assembly is needed for enable the Neon processor of beegle board

    That's also the kernel's job.  If you executed a NEON instruction with a kernel that had NEON disabled, I'd expect your process to killed by SIGILL.  'uname -a' will tell us the kernel version number.

    3. My gcc version is Red Hat 3.4.4-2

    That looks like the host compiler.  I should have said 'arm-none-linux-gnueabi-gcc --version'
  • Note: This was originally posted on 7th July 2010 at http://forums.arm.com

    Dear scott,

    the tool chain  version is given below

    (2007q3-51) 4.2.1

    will i get neon performance by this version of tool chain !

    I have one doubt will my code can enter the cache memory..?

    The OS critical module can use the cache all the time.?

    Dave
  • Note: This was originally posted on 12th July 2010 at http://forums.arm.com

    the tool chain  version is given below

    (2007q3-51) 4.2.1

    will i get neon performance by this version of tool chain !


    I expect that if you are using '-O3 -mcpu=cortex-a8 -mfpu=neon -mfloat-abi=softfp' that 2007q3-51 will try to vectorize.  You can use objdump to find out how well it is doing.  You should probably consider using 2010q1 as it's 2.5 years newer.

    I have one doubt will my code can enter the cache memory..?

    The OS critical module can use the cache all the time.?


    Your code will share the cache with other processes and the OS.  If the OS and other processes aren't executing much then your code should stay in the cache (if it fits).