I am used to gcc optimizing away the sort of "for (i=0; i<DELAYCOUNT; i++) ;" loops that people sometimes try to use for delays.
But arm gcc seems to be very inconsistent in this area.
the following code, compiled with arm-gcc version 5.4, 6, 8, 9, or 10 and -Os, -O2, or -O3 will optimize away the loop in delay(), but NOT the for loop in main() ??
void delay() { for (int i=0; i < 9000000; i++) {} } int main() { while(1) { for(int i=0; i<9000000; i++){} //Run a few cycles doing nothing } }
arm gcc 7 optimizes away both loops. g++ optimizes away both loops.
from gcc 10:
/Downloads/gcc-arm-10/bin/arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -g -Os -Wall -Wextra loop.c -c; arm-objdump -S loop.o loop.o: file format elf32-littlearm Disassembly of section .text: 00000000 <delay>: void delay() { for (int i=0; i < 9000000; i++) {} } 0: 4770 bx lr Disassembly of section .text.startup: 00000000 <main>: int main() { 0: 4b02 ldr r3, [pc, #8] ; (c <main+0xc>) while(1) { for(int i=0; i<9000000; i++){} //Run a few cycles doing nothing 2: 3b01 subs r3, #1 4: 2b00 cmp r3, #0 6: d1fc bne.n 2 <main+0x2> 8: e7fa b.n 0 <main> a: 46c0 nop ; (mov r8, r8) c: 00895440 .word 0x00895440
(I'm not happy about the extra "cmp" instruction, either. The subs will have set the flags. with cpu=cortex-m4 it does better.)
Hmm, this looks like a conditional deadcode elimination bug, but looks to have started since GCC 8 not 10.
This is a generic bug, as such reported upstream gcc.gnu.org/.../show_bug.cgi