Hi
I've recently been working on a Linux application which uses the LLVM C++ API to generate IR targeted at Morello pure-cap and then compile this to C64 object code.
I realised that the Aarch64 backend target modifications for Morello utilise a non-default address space, 200, for capabilities - everything works fine though (albeit was challenging as I am running the compilation on x86_64 but generating IR with a Morello pure-cap data layout).
However, I note that for the fat pointers of AS 200, the index size is set to 64-bits not 128-bits. This prevents a load/store vectorization pass being possible during IR optimisation (LLVM library asserts if pointer_size != index_size).
Q1: From comments in the code, it looks like using 64-bit index size instead of 128-bit for capability pointers in the data layout is a temporary measure. Do Arm have a timescale of when it can be updated to 128-bit?
Q2: This second question is really aimed at the community in general... does anyone have experience of whether skipping this optimisation pass is going to make any real world difference to the performance of compiled IR on Morello? From LLVM code comments it looks like Aarch64 won't really benefit from this anyway... I haven't yet done any tests of my own due to other work commitments, but it is something I was planning to investigate (on Morello hybrid, probably). Would be interesting to hear if Arm / anyone else has input into this, if not, I'll update if/when I get any useful metrics myself.
Many Thanks
Pete
PeteD said:2. Clearly, then, I misunderstood the below comments in the LLVM IR::DataLayout::parseSpecifier(): Fullscreen12345// Size of index used in GEP for address calculation.// The parameter is optional. By default it is equal to size of pointer.// XXXAR: For compatibility make isFat default to index width = 64 bits so// we don't have to add the index width to the datalayout immediatelyunsigned IndexSize = isFat ? 8 : PointerMemSize;XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX// Size of index used in GEP for address calculation. // The parameter is optional. By default it is equal to size of pointer. // XXXAR: For compatibility make isFat default to index width = 64 bits so // we don't have to add the index width to the datalayout immediately unsigned IndexSize = isFat ? 8 : PointerMemSize; I (now clearly erroneously) interpreted this comment as saying "we don't have to implement this right away, we'll do it later".
2. Clearly, then, I misunderstood the below comments in the LLVM IR::DataLayout::parseSpecifier():
// Size of index used in GEP for address calculation. // The parameter is optional. By default it is equal to size of pointer. // XXXAR: For compatibility make isFat default to index width = 64 bits so // we don't have to add the index width to the datalayout immediately unsigned IndexSize = isFat ? 8 : PointerMemSize;
I (now clearly erroneously) interpreted this comment as saying "we don't have to implement this right away, we'll do it later".
This is:
* We didn't used to specify the index size but had our own hacks to introduce an inferred one
* Upstream LLVM added a proper notion of index width, so we made it default to the right value for CHERI capabilities
* We went and explicitly specified it everywhere
<--- You are / Morello LLVM is here
* We went and made it required for capabilities rather than inferred (long-overdue)
<--- CHERI LLVM is here as of November last year
Ok, yeah makes sense. Thanks for the context on this. For future reference, is there any kind of "Using the LLVM API for Morello targets - hints & tips" documentation / FAQ anywhere? (other than ploughing through the clang source code to see what it does).
Also, out of interest, why address space #200? Any logic behind that (just wondered)?