Using movw and movt to load a label address into a register in Arm 32 architecture. but this is not position independent code.
movw r1, #:lower16:ASM_NAME(forkx)movt r1, #:upper16:ASM_NAME(forkx)
As per the manual also it specifies that it will be resolved at the link time.
Need a position independent code, so as per the manual adr, adrl can be used, but getting below error:
../asm-arm/unix_arm.S:115:1: error: unsupported relocation on symboladr r1, __be_forkx
../asm-arm/unix_arm.S:60:1: error: invalid instruction, did you mean: adr?adrl r1, __be_forkx
it seems label can not be used in the aarch32, it is fine in aarch64 and works as intendent.
is the usage of adr command is improper? Is there a way to achieve this in aarch32? is there any equivalent command that can be used?
DeepakHegde said:../asm-arm/unix_arm.S:115:1: error: unsupported relocation on symboladr r1, __be_forkx ../asm-arm/unix_arm.S:60:1: error: invalid instruction, did you mean: adr?adrl r1, __be_forkx
The encoding for adr instruction (in the A32 ISA) says that it can address locations which are within some range [4KB either side of that instruction is for T32]. Does the location "__be_forkx" satisfy that constraint?
Why is there a need for generating PIC devoid of any relocations?
Edit: Correction in the range of adr. I originally listed the restriction for T32.
Thanks for the reply surati,
1) label __be_forkx is there in another link library, so this may not satisfy 4K condition, so in wanted to use adrl command, but it seems not supported. in 64bit version using adrp and add.
2) We need to generate PIC enabled code to make sure that addresses not tracked and to avoid the same address in every invocation of the binary.
This is a direct assembly code, so is there any equivalent operation that can be used to avoid this? PIC enabled?
DeepakHegde said:this may not satisfy 4K condition
Apologies. 4K is under T32. For e.g., the A32 adr instruction "x: add r0,pc,#0x20000000" calculates the address of a location 512MB above pc (i.e., r0 = x+8+0x20000000). The linked article describes how valid offsets are calculated.
DeepakHegde said:label __be_forkx is there in another link library
And is that library statically linked with your application? or dynamically? I am trying to understand how adrp worked here but adr fails.
DeepakHegde said:We need to generate PIC enabled code to make sure that addresses not tracked and to avoid the same address in every invocation of the binary.
I did not understand this. Oh, may be it is address space randomization?
It is static linking at the end. but still with the PIC enabled, at run time it will get the location for __be_forkx, so should be able to load it run time.
I think ADRP works here because it can load starting page address and can have more jump. but adrp is not there in A32.
Yes, your understanding is correct, want to achieve Address Space Layout Randomisation (ASLR).
You mentioned linked article specifies the way to calculate the valid address, is there a link that i can read and get the offset? is there a way i can get this done?
DeepakHegde said:You mentioned linked article specifies the way to calculate the valid address, is there a link that i can read and get the offset?
The possible valid offsets accepted by the A32-adr instruction, is given here: "... an have any value that can be produced by rotating an 8-bit value right by any even number of bits within a 32-bit word".
Was the adr instruction originally an "ldr r1, =__be_forkx" instruction?
Since the modification succeeds with A64-adrp instruction, you may want to confirm that the failure of A32-adr instruction is actually because of the range limitation.
A32-adrl is supported, at least by GNU as. The invalid-instruction error, upon encountering adrl, points towards the assembler. One can write a small testcase to see if the assembler recognizes adrl. If not, since adr/l are, beneath the surface, add/sub instructions, they can be hand-coded using add/sub, although I did read somewhere that such a practice is discouraged.
Assuming that a relocatable, bare-metal binary is being built, would it not be simpler to build it as a shared object? Such a build should emit appropriate relocations that some startup code can easily fix.
That startup code may have to be written carefully, but majority of the binary may not need such instruction-level modifications.
You may also want to investigate, within the limits imposed by your employer and by licenses involved, how other kernels implement relocations/randomizations.
Thanks for the input surati,
i checked the adr issue is with the range, if we add the label in the same file and then try for adr then it is fine.
using the clang compiler here, so may be this compiler is not supported with adrl command. will cross verify this with GCC.
We have this assembly file and to be made part of the binary, i can make this part of shared library, but still this shared library will not be ASLR compliant.
checked in other kernel with x86, mips architecture, if we use the pc relocatable instruction then it will be ASLR compliant.
So with the link given above only possible way is to have the label within the 4096B offset in A32. adr command support only this, need to check for the more jump option.
DeepakHegde said:i checked the adr issue is with the range, if we add the label in the same file and then try for adr then it is fine.
True, adr/l may not work for external symbols. GNU as complains about undefined symbol even if that symbol is declared as global (IMPORTed in clang/armasm).
DeepakHegde said:using the clang compiler here, so may be this compiler is not supported with adrl
In that case, one can manually emit two add/sub instructions, the first one with "add/sub r1,pc,offset1" and second one with "add/sub r1,r1,offset2", such that the final value in r1 is the address required. The offset1 value must be such that "it can be produced by rotating an 8-bit value right by any even number of bits within a 32-bit word".
DeepakHegde said:checked in other kernel with x86, mips architecture, if we use the pc relocatable instruction then it will be ASLR compliant.
The Linux kernel configuration for arm64 enables KASLR by default. The vmlinux binary thus built contains many relocations that the kernel's startup routine fixes at runtime. It may be not practical, or even possible, in certain cases, to utilize pc-relative addresses.
DeepakHegde said:only possible way is to have the label within the 4096B offset in A32.
I am sorry for the confusion. With A32, one has accesses to considerably larger region than just 4KB. You may want to run a few test-cases with GNU assembler and the adrl instruction, and see how the addresses are generated there.
Thanks a lot for all your time and help.
So suggestion is to use a add and subtrach a offset to PC to get the real offset. got the suggestion and trying to implement.
trying to achieve this with add and sub.
../asm-arm/unix_arm.S:116:9: error: expected relocatable expression add r1, pc, __be_forkx
../asm-arm/unix_arm.S:116:9: error: expected relocatable expression add r1, pc, #__be_forkx
I have used the similar add instruction in A64 as below and it works fine to get the label offset.
adrp x4, __be_forkx add x4,x4, :lo12:__be_forkx
Is A32 have some specific way? checked in the add instruction in the manual below:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100069_0609_00_en/pge1425889876961.html&_ga=2.3850608.1165552664.1585541156-1811185655.1581583184
it specifies it need to be a constant and as below:
ADD Rd, pc, #immimm range 0-1020, word aligned. Rd must be a Lo register. Bits[1:0] of the PC are read as 0 in this instruction.
ADD Rd, pc, #imm
imm
Rd
DeepakHegde said:I have used the similar add instruction in A64 as below and it works fine to get the label offset. adrp x4, __be_forkx add x4,x4, :lo12:__be_forkx
I tested the above snippet; it generates two relocations. So, it cannot be said to have worked.
Edit: GNU assembler's info on adrp relocations.
it is working for me, and is getting the proper address of forkx, verified this with gdb.
adrp is loading a 4K start address, which will mask the lower 12bit into X4.
add command above will add the lower 12 bit to X4, which is making upto the proper address.
but same i can not do in A32, as adrp is not present. so looking for a alternative.
DeepakHegde said:it is working for me, and is getting the proper address of forkx, verified this with gdb.
Do you mean to say that you already handle the adrp relocations? If not, the next time your binary is run, the address of forkx will change, and there won't be anybody to fix it.
In A64 i have done this and with gdb back trace can see proper address is getting loaded.
(gdb) info registers
x0 0x30a0 12448
x1 0x0 0
x2 0x7fdf1bf6b0 549203998384
x3 0x55ad57f898 367980443800
x4 0x55583dd000 367089537024
x5 0x0 0
x6 0x1 1
2. after the add:
x4 0x55583dd5c0 367089537024
3. Address we are getting from the instruction adrp x4,ASM_NAME(forkx) , here We are loading forkx structure address to x4, if we see the address of the forkx we can see below:
(gdb) print &forkx
$13 = (sprocess * __be *) 0x55583dd5c0 <__be_forkx>
every time image is loaded i can see this address is fine and working fine. and check for PIC/PIE and TEXTREL is fine on the created image in A64.
but need the same for A32.
I think that the offsets calculated by adrp are being forced to be included in the resulting binary without the help of relocations. This would then mean that the location of forkx with respect to the adrp instruction must be kept fixed..
You can do this with A32, assuming that the following distances do not change across different runs of the same binary:
/* 1.s */ .text .global forkx .global _start _start: nop nop load_dist: ldr r1, dist_forkx load_addr: ldr r0, [pc, r1] dist_forkx: .word forkx-load_addr-8
/* 2.s */ .text .fill 0x41020 .global forkx forkx: nop
as 1.s -o 1.o as 2.s -o 2.o ld 1.o 2.o
Edit: 1.s tries to read from the address of forkx. Modified 1.s below:
/* 1.s v2 */ .text .global forkx .global _start _start: nop nop load_dist: ldr r1, dist_forkx /* save lr if necessary */ bl load_addr load_addr: add r0,lr,r1 dist_forkx: .word forkx-load_addr
Edit2: An even simpler version. All code untested.
/* 1.s v3 */ .text .global forkx .global _start _start: nop nop load_dist: ldr r1, dist_forkx load_addr: adr r0, load_addr add r0,r0,r1 dist_forkx: .word forkx-load_addr
I have used the ldr instruction as below and with that compilation goes fine, but it will not be ASLR. address will be fixed.
ldr r1, =__be_forkx
I will try this also, not able to understand this 100%, with this will i have address of forkx in r0?
DeepakHegde said:ldr r1, =__be_forkx
That will cause a relocation to be emitted.
DeepakHegde said:I will try this also, not able to understand this 100%, with this will i have address of forkx in r0?
Yes. Instead of storing the absolute address of forkx inside a literal, it now stores the distance between the instruction that wants the address of forkx and the forkx itself. That distance must remain constant, however, across multiple runs of the same binary.