This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Linux Pure-cap: CSP Invalidated on indexed load/store when running mmap()d code buffer

Hi

Hope someone can help with a strange problem I am having with linux pure-cap as I cannot work out what I am doing wrong.

I am trying to mmap() a buffer, write some Asm code into into and then execute the code in the buffer by branching to an address inside this buffer.

Everything will work but any load or store with pre-indexing or post-indexing will cause the capability access register to be invalidated (tag goes to 0) as if I updated it with a non-capability value.

Here are some examples of what causes the problem (note - not real code just example operations):

STR X0, [CSP], #16
LDR C1, [C0, #16]!
STP C1, C2, [C16, #-32]!

But these instructions would work ok, for example, i.e they would not invalidate the tag of the capability register used for addressing):

SUB CSP, CSP, #48
LDR X0, [CSP, #16]
STP C1, [C0, #32]

The classic example is a fault on CSP because typically the compiler will give a "store registers to stack and decrement CSP" at the start of a function.  What then happens is a seg fault when trying to use CSP after code returns from the mmap() buffer.

Here is a short, cut-down example to illustrate what I am doing:

static const uint8_t asmData[] = {
    0xe1, 0x0b, 0xbf, 0x62, /* stp c1, c2, [csp, #-32]! */
    0xff, 0x83, 0x00, 0x02, /* add csp, csp, #32 */
    0xc0, 0x53, 0xc2, 0xc2 /* ret c30 */
};

typedef void(*FUNC_PTR)();
#define BUF_LEN 4096

int main()
{
    uint8_t *code_buf = (uint8_t*)mmap(NULL, BUF_LEN,
        PROT_MAX(PROT_READ | PROT_WRITE | PROT_EXEC), /* PROT_MAX() */
        MAP_PRIVATE | MAP_ANON | MAP_NORESERVE,     /* Flags */
            -1, 0);

    mprotect(code_buf, BUF_LEN, PROT_READ | PROT_WRITE);
    
    // Copy code into the buffer
    memcpy((uint8_t*)code_buf, asmData, sizeof(asmData));

    // Make buffer executable
    mprotect(code_buf, BUF_LEN, PROT_READ | PROT_EXEC);
 
    FUNC_PTR fp = (FUNC_PTR)cheri_sentry_create(code_buf);

    // Run from buffer
    fp();

    munmap(code_buf, BUF_LEN);
    return 0;
}

What is wrong with the above?

If the code is running from an asm function built into my executable then all works fine.

Note: - other things I have tried:

  • mmap() a data buffer and use this for the load/store destination addres
  • Link my exe to a fixed memory address and mmap() a fixed address (include MMAP_FIXED flag)
  • Clear/invalidate data cache before branching to the mmap()'d buffer via DC CIVAC
  • Clear/invalidate caches with __builtin___cache_clear()
  • Tried building using LLVM/Clang and also with GCC
  • Add nops before the STP instruction in the buffer in case there is a pipeline needing flushing (?)
  • Used cheri_perm_and() and cheri_bounds_set() to make my code_buf capability match as near as possible to PCC (i.e all same permission flags)
  • Tried branching to restricted mode when calling the code in the mmap() buffer with BRR <addr> and then RETR and first setting up RCSP
  • Ensured all memory buffers aligned to at least 1024 byte alignment and tried using an offset into the mmap() buffer (i.e start at &buffer[1024])

My SW stack is v1.6 morello release.  LLVM toolchain I am using was built from morello/dev ~1 month ago (I can provide the git commit hash if required).  Silicon version of the morello HW, if required please advise what info I'd need to provide.

Hopefully it is just something I am doing wrong, as this problem is pretty much a showstopper for the work I am doing.

Thanks

Pete