I am using ARMv8 GCC compiler(aarch64-none-elf-gcc) for my bare metal application on ARM cortex a53. I am using neon intrinsics with plain C in my code so I would like to ensure to use all optimization option available for this compiler.
I tried -mfpu…
I am using ARMv8 GCC compiler and I would like to optimize Neon Intrinsics code for better execution time performance. I have already tried loop unrolling and I am using look up table for the computation of log10. Any ideas?
Here is the code:
static inline…
I'm having trouble finding any informations on partial neon register dependencies.
Take for example the following code:
ld2 {v0.16b, v1.16b}[0], [x0] ld2 {v0.16b, v1.16b}[1], [x1] ld2 {v0.16b, v1.16b}[2], [x2] ...
Does the second load have to wait…
Hello,
forgive me if my question is a litte bit weak in content and linguistic. I'm only a Hobbyist and english is not my nativ.
I'm trying to compile an App from Einstein@Home for AARCH64 using GCC. Einstein@Home is a DC-Projekt using Boinc. The App…
For the view of architecture, why the coprocessor is removed for A64 instruction set?
where can I get documentation for the AARCH64 and NEON64 assembly syntax for armclang (internal assembler)
I have some issues when compiling my GNU assembly code with armclang.
For example, the instruction:
MOV v0.2d[0], x4
reports "error: invalid…