I'm trying to write a very simple function in two or three aarch64 instructions as 'inline assembler' inside a C++ source file.With the aarch64 calling convention on Linux, if a function returns a very large struct by value, then the address of where to store the return value is passed in the X8 register. This is out of the ordinary as far as calling conventions go. Every other calling convention, for example System V x86_64, Microsoft x64, cdecl, stdcall, arm32, pass the address of the return value in the first parameter. So for example with x86_64 on Linux, the RDI register contains the address of where to store the very large struct.I want to try emulate this behaviour on aarch64 on Linux. When my assembler function is entered, I want it to do two things:(1) Put the address of the indirect return object into the first parameter register, i.e. move X8 to X0(2) Jump to a location specified by a global function pointerSo here's how I think my assembler function should look:
__asm("Invoke: \n" " mov x0, x8 \n" // move return value address into 1st parameter " mov x9, f \n" // Load address of code into register " br x9 \n" // Jump to code );
I don't know what's wrong here but it doesn't work. In the following complete C++ program, I use the class 'std::mutex' as it's a good example of a class that can't be copied or moved (I am relying on mandatory Return Value Optimisation).Here is my entire program in one C++ file, could someone please help me write the assembler function properly? Am I supposed to be using the ADRP and LDR instructions instead of MOV?
#include <mutex> // mutex #include <iostream> // cout, endl using std::cout, std::endl; void (*f)(void) = nullptr; extern "C" void Invoke(void); __asm("Invoke: \n" " mov x0, x8 \n" // move return value address into 1st parameter " mov x9, f \n" // Load address of code into register " br x9 \n" // Jump to code ); void Func(std::mutex *const p) { cout << "Address of return value: " << p << endl; } int main(void) { f = (void(*)(void))Func; auto const p = reinterpret_cast<std::mutex (*)(void)>(Invoke); auto retval = p(); cout << "Address of return value: " << &retval << endl; }
Instead of using "inline assembler" inside a C++ source file, I instead tried to make a separate assembler file.Here's what I have, but it still doesn't work, it's still segfaulting inside 'detail_Invoke'.
.text .global tl_p .Addr_tl_p: .xword tl_p .global detail_Invoke detail_Invoke: adrp x9, [.Addr_tl_p] ldr x9, [x9] mov x10, x9 br x10
adrp only loads the page of the address. You need to add the offset of the address within the page. I would try something like
.text .global tl_p .Addr_tl_p: .xword tl_p .global detail_Invoke .type detail_invoke, %function detail_Invoke: adrp x9, .Addr_tl_p add x9, x9, :lo12:.Addr_t1_p ldr x9, [x9] br x9
The following works in my C++ source file:
extern "C" { void (*f)(void) = nullptr; void detail_Invoke(void); } __asm( ".text \n" "detail_Invoke: \n" " mov x1, x8 \n" " adr x9, f \n" " ldr x9, [x9] \n" " br x9 \n" );
Now I just need to make one change to it. 'f' is actually a thread_local global variable defined as follows:
thread_local void (*f)(void) = nullptr;Does anyone know how I can access a global thread_local variable from within my inline assembler inside my C++ source file?