This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

LLVM Address Space for Morello Port and IR Optimisation

Hi

I've recently been working on a Linux application which uses the LLVM C++ API to generate IR targeted at Morello pure-cap and then compile this to C64 object code.

I realised that the Aarch64 backend target modifications for Morello utilise a non-default address space, 200, for capabilities - everything works fine though (albeit was challenging as I am running the compilation on x86_64 but generating IR with a Morello pure-cap data layout).

However, I note that for the fat pointers of AS 200, the index size is set to 64-bits not 128-bits.  This prevents a load/store vectorization pass being possible during IR optimisation (LLVM library asserts if pointer_size != index_size).

Q1: From comments in the code, it looks like using 64-bit index size instead of 128-bit for capability pointers in the data layout is a temporary measure.  Do Arm have a timescale of when it can be updated to 128-bit?

Q2: This second question is really aimed at the community in general... does anyone have experience of whether skipping this optimisation pass is going to make any real world difference to the performance of compiled IR on Morello?  From LLVM code comments it looks like Aarch64 won't really benefit from this anyway... I haven't yet done any tests of my own due to other work commitments, but it is something I was planning to investigate (on Morello hybrid, probably).  Would be interesting to hear if Arm / anyone else has input into this, if not, I'll update if/when I get any useful metrics myself.

Many Thanks

Pete

Parents
  • albeit was challenging as I am running the compilation on x86_64 but generating IR with a Morello pure-cap data layout

    Out of interest, what were the issues with that? LLVM doesn't really care about the host architecture when compiling other than for setting defaults, so as long as you set the right triple, ABI, DataLayout and target features it shouldn't matter where you run the compilation. Even running on Morello it won't (currently) default to the right thing for purecap code generation.

    Q1: From comments in the code, it looks like using 64-bit index size instead of 128-bit for capability pointers in the data layout is a temporary measure. 

    No, that's by design. The index size is how big your offsets are, which are 64-bit. A 128-bit index size would mean you had zero metadata bits. Where are you seeing comments that make you think otherwise?

    Q2: This second question is really aimed at the community in general... does anyone have experience of whether skipping this optimisation pass is going to make any real world difference to the performance of compiled IR on Morello?  From LLVM code comments it looks like Aarch64 won't really benefit from this anyway...

    If you mean LoadStoreVectorizer, that only gets added for GPU targets (AMDGPU and NVPTX). Vectorisation involving loads and stores is definitely a thing that already happens for Morello; see https://cheri-compiler-explorer.cl.cam.ac.uk/z/vGvxM9 for an example, which gets vectorised by SLPVectorizer.

    on Morello hybrid, probably

    Why are you interested in hybrid? At Cambridge we're generally of the view that you shouldn't use hybrid unless you absolutely have to, and even then you should think twice. Doing anything non-trivial quickly becomes painful, with __capability annotations everywhere and a lack of interoperability with non-capability-taking functions, including basic things like the C standard library.

Reply
  • albeit was challenging as I am running the compilation on x86_64 but generating IR with a Morello pure-cap data layout

    Out of interest, what were the issues with that? LLVM doesn't really care about the host architecture when compiling other than for setting defaults, so as long as you set the right triple, ABI, DataLayout and target features it shouldn't matter where you run the compilation. Even running on Morello it won't (currently) default to the right thing for purecap code generation.

    Q1: From comments in the code, it looks like using 64-bit index size instead of 128-bit for capability pointers in the data layout is a temporary measure. 

    No, that's by design. The index size is how big your offsets are, which are 64-bit. A 128-bit index size would mean you had zero metadata bits. Where are you seeing comments that make you think otherwise?

    Q2: This second question is really aimed at the community in general... does anyone have experience of whether skipping this optimisation pass is going to make any real world difference to the performance of compiled IR on Morello?  From LLVM code comments it looks like Aarch64 won't really benefit from this anyway...

    If you mean LoadStoreVectorizer, that only gets added for GPU targets (AMDGPU and NVPTX). Vectorisation involving loads and stores is definitely a thing that already happens for Morello; see https://cheri-compiler-explorer.cl.cam.ac.uk/z/vGvxM9 for an example, which gets vectorised by SLPVectorizer.

    on Morello hybrid, probably

    Why are you interested in hybrid? At Cambridge we're generally of the view that you shouldn't use hybrid unless you absolutely have to, and even then you should think twice. Doing anything non-trivial quickly becomes painful, with __capability annotations everywhere and a lack of interoperability with non-capability-taking functions, including basic things like the C standard library.

Children