This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

LLVM Address Space for Morello Port and IR Optimisation

Hi

I've recently been working on a Linux application which uses the LLVM C++ API to generate IR targeted at Morello pure-cap and then compile this to C64 object code.

I realised that the Aarch64 backend target modifications for Morello utilise a non-default address space, 200, for capabilities - everything works fine though (albeit was challenging as I am running the compilation on x86_64 but generating IR with a Morello pure-cap data layout).

However, I note that for the fat pointers of AS 200, the index size is set to 64-bits not 128-bits.  This prevents a load/store vectorization pass being possible during IR optimisation (LLVM library asserts if pointer_size != index_size).

Q1: From comments in the code, it looks like using 64-bit index size instead of 128-bit for capability pointers in the data layout is a temporary measure.  Do Arm have a timescale of when it can be updated to 128-bit?

Q2: This second question is really aimed at the community in general... does anyone have experience of whether skipping this optimisation pass is going to make any real world difference to the performance of compiled IR on Morello?  From LLVM code comments it looks like Aarch64 won't really benefit from this anyway... I haven't yet done any tests of my own due to other work commitments, but it is something I was planning to investigate (on Morello hybrid, probably).  Would be interesting to hear if Arm / anyone else has input into this, if not, I'll update if/when I get any useful metrics myself.

Many Thanks

Pete

Parents
  • albeit was challenging as I am running the compilation on x86_64 but generating IR with a Morello pure-cap data layout

    Out of interest, what were the issues with that? LLVM doesn't really care about the host architecture when compiling other than for setting defaults, so as long as you set the right triple, ABI, DataLayout and target features it shouldn't matter where you run the compilation. Even running on Morello it won't (currently) default to the right thing for purecap code generation.

    Q1: From comments in the code, it looks like using 64-bit index size instead of 128-bit for capability pointers in the data layout is a temporary measure. 

    No, that's by design. The index size is how big your offsets are, which are 64-bit. A 128-bit index size would mean you had zero metadata bits. Where are you seeing comments that make you think otherwise?

    Q2: This second question is really aimed at the community in general... does anyone have experience of whether skipping this optimisation pass is going to make any real world difference to the performance of compiled IR on Morello?  From LLVM code comments it looks like Aarch64 won't really benefit from this anyway...

    If you mean LoadStoreVectorizer, that only gets added for GPU targets (AMDGPU and NVPTX). Vectorisation involving loads and stores is definitely a thing that already happens for Morello; see https://cheri-compiler-explorer.cl.cam.ac.uk/z/vGvxM9 for an example, which gets vectorised by SLPVectorizer.

    on Morello hybrid, probably

    Why are you interested in hybrid? At Cambridge we're generally of the view that you shouldn't use hybrid unless you absolutely have to, and even then you should think twice. Doing anything non-trivial quickly becomes painful, with __capability annotations everywhere and a lack of interoperability with non-capability-taking functions, including basic things like the C standard library.

Reply
  • albeit was challenging as I am running the compilation on x86_64 but generating IR with a Morello pure-cap data layout

    Out of interest, what were the issues with that? LLVM doesn't really care about the host architecture when compiling other than for setting defaults, so as long as you set the right triple, ABI, DataLayout and target features it shouldn't matter where you run the compilation. Even running on Morello it won't (currently) default to the right thing for purecap code generation.

    Q1: From comments in the code, it looks like using 64-bit index size instead of 128-bit for capability pointers in the data layout is a temporary measure. 

    No, that's by design. The index size is how big your offsets are, which are 64-bit. A 128-bit index size would mean you had zero metadata bits. Where are you seeing comments that make you think otherwise?

    Q2: This second question is really aimed at the community in general... does anyone have experience of whether skipping this optimisation pass is going to make any real world difference to the performance of compiled IR on Morello?  From LLVM code comments it looks like Aarch64 won't really benefit from this anyway...

    If you mean LoadStoreVectorizer, that only gets added for GPU targets (AMDGPU and NVPTX). Vectorisation involving loads and stores is definitely a thing that already happens for Morello; see https://cheri-compiler-explorer.cl.cam.ac.uk/z/vGvxM9 for an example, which gets vectorised by SLPVectorizer.

    on Morello hybrid, probably

    Why are you interested in hybrid? At Cambridge we're generally of the view that you shouldn't use hybrid unless you absolutely have to, and even then you should think twice. Doing anything non-trivial quickly becomes painful, with __capability annotations everywhere and a lack of interoperability with non-capability-taking functions, including basic things like the C standard library.

Children
  • Hi Jessica, thanks for your reply.  Yes I meant LoadStoreVectorizer() so that is useful to know it is only applicable for GPU targets, which means I don't need to worry about it!

    To answer your other points...

    1. I'm not directly doing a cross-compilation, what is actually happening is webassembly is generating IR which then compiles to object code + also various webassembly data sections and tables which are parsed at runtime (on Morello).  So I am not generating an ELF but some bespoke file format.

    The reason it was so tricky was because when compiling WASM to IR it is basing this on a 64-bit pointer/alignment, so all structures containing pointers are not aligned as it would be on Morello pure-cap.  So for example any GEP instructions will have indexes that aren't compatible with the layout at runtime - because that is compiled for Morello pure-cap only.  Hope that makes sense.

    It is BTW further complicated by the fact that the compile side running on x86_64 should be built to be capable of compiling this output format for any target, with the target specified at runtime.

    2. Clearly, then, I misunderstood the below comments in the LLVM IR::DataLayout::parseSpecifier():

    // Size of index used in GEP for address calculation.
    // The parameter is optional. By default it is equal to size of pointer.
    // XXXAR: For compatibility make isFat default to index width = 64 bits so
    // we don't have to add the index width to the datalayout immediately
    unsigned IndexSize = isFat ? 8 : PointerMemSize;

    I (now clearly erroneously) interpreted this comment as saying "we don't have to implement this right away, we'll do it later".

    3. No, I am not interested in hybrid.  What I was saying was that if I were to analyse the performance with and without LoadStoreVectorizer() pass, then I would likely do that on Morello hybrid (as I don't have another Aarch64 target readily available).  Obviously I couldn't analyse it on Morello pure-cap because, as explained, LLVM asserts if you try and add that optimisation pass.

    Anyway, that's superfluous now based on your answer, so thankyou for clearing that up.

    Pete.

  • 2. Clearly, then, I misunderstood the below comments in the LLVM IR::DataLayout::parseSpecifier():

    Fullscreen
    1
    2
    3
    4
    5
    // Size of index used in GEP for address calculation.
    // The parameter is optional. By default it is equal to size of pointer.
    // XXXAR: For compatibility make isFat default to index width = 64 bits so
    // we don't have to add the index width to the datalayout immediately
    unsigned IndexSize = isFat ? 8 : PointerMemSize;
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    I (now clearly erroneously) interpreted this comment as saying "we don't have to implement this right away, we'll do it later".

    This is:

    * We didn't used to specify the index size but had our own hacks to introduce an inferred one

    * Upstream LLVM added a proper notion of index width, so we made it default to the right value for CHERI capabilities

    * We went and explicitly specified it everywhere

    <--- You are / Morello LLVM is here

    * We went and made it required for capabilities rather than inferred (long-overdue)

    <--- CHERI LLVM is here as of November last year

  • Ok, yeah makes sense.  Thanks for the context on this.  For future reference, is there any kind of "Using the LLVM API for Morello targets - hints & tips" documentation / FAQ anywhere? (other than ploughing through the clang source code to see what it does).

    Also, out of interest, why address space #200?  Any logic behind that (just wondered)?