[bug] Unexpected optimization strategy (arm-none-eabi-gcc 15)

#include <stdint.h>
#include <stddef.h>

/**
 * Name:    CRC-32/MPEG-2  x32+x26+x23+x22+x16+x12+x11+x10+x8+x7+x5+x4+x2+x+1
 * Poly:    0x4C11DB7
 * Init:    0xFFFFFFF
 * Refin:   False
 * Refout:  False
 * Xorout:  0x0000000
 * Note:
 */
uint32_t crc32(const uint8_t *data, size_t length)
{
    uint8_t i;
    uint32_t crc = 0xffffffff; // Initial value
    while (length--) {
        crc ^= (uint32_t)(*data++) << 24;
        for (i = 0; i < 8; ++i) {
            if (crc & 0x80000000)
                crc = (crc << 1) ^ 0x04C11DB7;
            else
                crc <<= 1;
        }
    }
    return crc;
}

For this code, the Os/Oz option results in generating larger code than O3.

https://godbolt.org/z/MEqqK1q5T

Section size for -Os options

```

|     .text |    .data |     .bss |  .rodata |
|    44(+0) |    0(+0) |    0(+0) | 1024(+0) |

```


Section size for -O3 options
```

|     .text |    .data |     .bss |  .rodata |
|   104(+0) |    0(+0) |    0(+0) |    0(+0) |

```

The behavior is normal on gcc 10.3.1.

Parents
  • I think it should be this option: -foptimize-crc

    The Docs:

    ```

    -foptimize-crc

    Detect loops calculating CRC (performing polynomial long division) and replace them with a faster implementation. Detect 8, 16, 32, and 64 bit CRC, with a constant polynomial without the leading 1 bit, for both bit-forward and bit-reversed cases. If the target supports a CRC instruction and the polynomial used in the source code matches the polynomial used in the CRC instruction, generate that CRC instruction. Otherwise, if the target supports a carry-less-multiplication instruction, generate CRC using it; otherwise generate table-based CRC.

    Enabled by default at and higher. -O2

    ```

    When i use `-Os -fno-optimize-crc`, code size (.text + .rodata) become normal.

Reply
  • I think it should be this option: -foptimize-crc

    The Docs:

    ```

    -foptimize-crc

    Detect loops calculating CRC (performing polynomial long division) and replace them with a faster implementation. Detect 8, 16, 32, and 64 bit CRC, with a constant polynomial without the leading 1 bit, for both bit-forward and bit-reversed cases. If the target supports a CRC instruction and the polynomial used in the source code matches the polynomial used in the CRC instruction, generate that CRC instruction. Otherwise, if the target supports a carry-less-multiplication instruction, generate CRC using it; otherwise generate table-based CRC.

    Enabled by default at and higher. -O2

    ```

    When i use `-Os -fno-optimize-crc`, code size (.text + .rodata) become normal.

Children
No data