• How does the ARM CA53 4 core join NEON on only 2 cores?
    Our project only wants 2 cores to support NEON for cost reasons. How can I do this? 1. Can a single cluster be done? 2. Cut into 2 clusters, each with 2 cores. What is the difference between the performance...
  • How does the ARM CA53 4 core join NEON on only 2 cores?
    Our project only wants 2 cores to support NEON for cost reasons. How can I do this? 1. Can a single cluster be done? 2. Cut into 2 clusters, each with 2 cores. What is the difference between the performance...
  • How to access neon dot product vdotq_s32
    I'm using clang 5 also tried clang 7. Both don't seem to support the intrinsic vdotq_s32(c, a, b). I'm making use of VS2017 with Nvidia Codeworks for Android integration. I debug making use of a Shield...
  • How to access neon dot product vdotq_s32
    I'm using clang 5 also tried clang 7. Both don't seem to support the intrinsic vdotq_s32(c, a, b). I'm making use of VS2017 with Nvidia Codeworks for Android integration. I debug making use of a Shield...
  • memory copy using ARM NEON does not better than memcpy (a little improvement)
    I have re-implemented buffer(cropped) copy using ARM NEON. But it seems not to improve significantly compared to memcpy. https://godbolt.org/z/zv5aeTW1f - I can see ld4 and st4 instructions for arm...