We're trying to figure out an optimally small yet performant set of gcc builds for shipping arm binaries of Julia (http://julialang.org)
Previously we've built two binaries with:
which has provided good broad support, but we had questions about whether performance can be improved, given speed improvements seen when building natively with `mcpu=native`.
The systems we're particularly interested in supporting are:
- Raspberry Pi 4 (cortex-a72)
- Nvidia jetson TX1, Nano (cortex-a57 / Denver2 v8)
- Nvidia Xavier NX (Nvidia carmel v8.2)
We generally advise building locally with `mcpu=native`, but can improvements be made for our binaries?
p.s. not sure if this is the right forum for this question
I'm not familiar with Julia, but I hope I can answer generally.
All the above CPUs are Arm v8.0A or higher. I think it is reasonable to assume that all Arm based devices capable of supporting Julia will be at least v8.0A, as this brings 64-bit processing to Arm. Newer architecture extensions bring new instructions that may not be supported on all platforms. Do you expect one set of binaries to work on all the above platforms, or will they be recompiled by the user?
For general info an Arm v8A (and comparison with older architectures), see:https://developer.arm.com/architectures/cpu-architecture/a-profile
This document gives a useful overview of the different features per extension:
View all questions in Arm Compilers forum