The volatile modifier adds instructions to expand the variable to a 32-bit value (UXTB, UXTH, SXTB, SXTH).It doesn't make sense. checking code online
#include "stdint.h" struct _st { uint8_t a; volatile uint8_t b; }st; uint32_t test(uint8_t c) { uint8_t out; if(st.a > c) out = st.a; else out = st.b; return out; };
test: ldr r2, .L3 ldrb r3, [r2] //reading st.a cmp r3, r0 bhi .L2 ldrb r3, [r2, #1] //reading st.b uxtb r3, r3 // <<<<< .L2: mov r0, r3 bx lr .L3: .word .LANCHOR0 st:
Dear AVI_crak,
On arm 32bit, we are working with 32bit registers, so it must do something with the "rest" of an uint8_t (unsigned int on 8bits) that you're playing with.
Technically, if you see that you're using it always as an 8bit value, you can discard the forcing of extending it to a 32bit value, which is what it's done when you return st.a. However, by using the volatile qualifier, you tell the compiler to not optimize this variable. Thus, the expansion of the 8bit variable to fill a 32bit register remain.
Best Regards,Willy
Hello, thanks for your attention to the problem.I just want this business to get off the ground. You are saying that the legacy "volatile" extends to the holding register. For the safe operation of code below the load level.Okay, so ARM probably doesn't have auto-expanding boot instructions? Oops, there is "ldrsb".New piece of code, old problems: godbolt.org/.../cWza83Wrz
I am still confident that "volatile" adds extra operations that can be omitted. The Clang compiler successfully uses the correct load instructions. This means that GCC should be able to do that.In the online compiler, you can play around with the type and version of the compiler, the type of processor used, and the compilation options. Everything at once and in one place. This is more convenient than copying to the forum.It will be great if this problem is resolved.
Hello, can you be more specific regarding the processor being used, and the optimization level?
The default optimization (-O0) for armclang is highly un-optimized. It is generally recommended to use at least -O2.
> it must do something with the "rest" of an uint8_tldrb already sets the high 24 bits of the register to zero; there is nothing left to be done as long as the result is unsigned.The godbolt link shows the uxtb instruction used regardless of optimization level (well, -O0 does does something different but awful, as expected.)There's an interesting comment that shows up in the disassembly:
ldrb r3, [r2, #1] @ zero_extendqisi2 uxtb r3, r3
ldrb r3, [r2, #1] @ zero_extendqisi2
uxtb r3, r3
And there's a clue here: stackoverflow.com/.../meaning-of-zero-extendqisi2I guess the intermediate language has a generic "fetch and zero-extend" internal instruction, and it isn't smart enough to realize that in the case that the fetch is from memory, the extend has effectively already happened. (if the source of the fetch was a register, it would have been a mov instruction, and the uxtb would be necessary because there is no mov variant for bytes to 32bit register.)
I changed the example a little, and it turned out to be even more interesting.The structure now contains int8_t, which are read in two different ways.Simple variable "a" is read by ldrsb instruction, automatically expanded to int32_t. Because after it the "cmp" comparison instruction is used, which does not know how to work otherwise, and is a natural barrier.And then "b" is read with the "volatile" modifier. An instruction to read an unsigned variable is used, followed by a separate extension.
#include "stdint.h" /// -Os -mcpu=cortex-m7 struct _st { int8_t a; volatile int8_t b; }st; int32_t test(int32_t c) { int32_t out; if(st.a > c) out = st.a; else out = st.b; return out; };
test: ldr r2, .L3 ldrsb r3, [r2] cmp r3, r0 bgt .L1 ldrb r3, [r2, #1] @ zero_extendqisi2 sxtb r3, r3 .L1: mov r0, r3 bx lr .L3: .word .LANCHOR0 st: