We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
I am used to gcc optimizing away the sort of "for (i=0; i<DELAYCOUNT; i++) ;" loops that people sometimes try to use for delays.
But arm gcc seems to be very inconsistent in this area.
the following code, compiled with arm-gcc version 5.4, 6, 8, 9, or 10 and -Os, -O2, or -O3 will optimize away the loop in delay(), but NOT the for loop in main() ??
void delay() { for (int i=0; i < 9000000; i++) {} } int main() { while(1) { for(int i=0; i<9000000; i++){} //Run a few cycles doing nothing } }
arm gcc 7 optimizes away both loops. g++ optimizes away both loops.
from gcc 10:
/Downloads/gcc-arm-10/bin/arm-none-eabi-gcc -mcpu=cortex-m0 -mthumb -g -Os -Wall -Wextra loop.c -c; arm-objdump -S loop.o loop.o: file format elf32-littlearm Disassembly of section .text: 00000000 <delay>: void delay() { for (int i=0; i < 9000000; i++) {} } 0: 4770 bx lr Disassembly of section .text.startup: 00000000 <main>: int main() { 0: 4b02 ldr r3, [pc, #8] ; (c <main+0xc>) while(1) { for(int i=0; i<9000000; i++){} //Run a few cycles doing nothing 2: 3b01 subs r3, #1 4: 2b00 cmp r3, #0 6: d1fc bne.n 2 <main+0x2> 8: e7fa b.n 0 <main> a: 46c0 nop ; (mov r8, r8) c: 00895440 .word 0x00895440
(I'm not happy about the extra "cmp" instruction, either. The subs will have set the flags. with cpu=cortex-m4 it does better.)