• compiler optimization options for ARMv8 GCC compiler on ARM cortex a53 (bare metal application)

    I am using ARMv8 GCC compiler(aarch64-none-elf-gcc) for my bare metal application on ARM cortex a53. I am using neon intrinsics with plain C in my code so I would like to ensure to use all optimization option available for this compiler.

    I tried -mfpu…

  • Optimization of Neon Intrinsics on ARM cortexa53

    I am using ARMv8 GCC compiler and I would like to optimize Neon Intrinsics code for better execution time performance. I have already tried loop unrolling and I am using look up table for the computation of log10. Any ideas?

    Here is the code:

    static inline…

  • Partial register dependency neon

    I'm having trouble finding any informations on partial neon register dependencies.

    Take for example the following code:

    ld2 {v0.16b, v1.16b}[0], [x0]
    ld2 {v0.16b, v1.16b}[1], [x1]
    ld2 {v0.16b, v1.16b}[2], [x2]
    ...

    Does the second load have to wait…

  • float behaivior on AARCH64

    Hello,

    forgive me if my question is a litte bit weak in content and linguistic. I'm only a Hobbyist and english is not my nativ.

    I'm trying to compile an App from Einstein@Home for AARCH64 using GCC. Einstein@Home is a DC-Projekt using Boinc. The App…

  • Why in A64 the coprocessor is removed?

    For the view of architecture, why the coprocessor is removed for A64 instruction set?

  • AARCH64 assembly syntax for ARMCLANG

    Hello,

    where can I get documentation for the AARCH64 and NEON64 assembly syntax for armclang (internal assembler)

    I have some issues when compiling my GNU assembly code with armclang.

    For example, the instruction:

    MOV v0.2d[0], x4

    reports "error: invalid…