This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Code executes significantly faster when optimized with -Os than with -O3/-Ofast

This is most likely more of a beginner question. I'm struggling to benchmark this MVE-vectorizable function, taken from the CMSIS-NN library:

where `REF_VALUES` is an array of 1280 random  values.

Compiler version: arm-none-eabi-gcc (GNU Arm Embedded Toolchain 10-2020-q4-major) 10.2.1 20201103 (release)

The compiler flags are: -DARMCM55 -mcpu=cortex-m55 -mthumb -mfloat-abi=hard -Os -std=c99 -ffunction-sections -fdata-sections

When run on the Corstone300 MPS2 FVP, this reports 1018 cycles. When I change the optimization level to -O3, the reported cycles rise to 2519. Here is a list of reported cycles for other optimization levels:

-Os: 1018
-O1: 1031
-O2: 2505
-O3: 2519
-Ofast: 2519

I have checked the generated assembly and the inner loop looks identical between all version to me. I would be very interested in what could cause this steep drop in performance for higher optimization levels, because it seems very counterintuitive to me.

0