This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Defect insice ARM compiler intrinsic memset function

I have found severe defect inside ARM cor Cortex-M3 compiler. It occurs only in release build.

And this is compilation result. Only 8 bytes is cleared, by using STR.

0x00013F06 F88D5008 STRB r5,[sp,#0x08]
23: UINT8 data[9];

64: continue;
65: }
66:
0x00013F42 E00F B 0x00013F64

67: memset(data,0, sizeof(data));
70:
0x00013F44 9504 STR r5,[sp,#0x10]
0x00013F46 9505 STR r5,[sp,#0x14]

71: if(ReadBlockFromSRAM(data, addr, sizeof(data)))
72: {
0x00013F48 2209 MOVS r2,#0x09

=============================================================

When I use:
x2 = sizeof(data);
memset(data,0,x2);

It works fine.

  •     int8_t buffer[7] ;
    
        memset(buffer, 4, sizeof(buffer) ) ;
        memset(buffer, 1, sizeof(buffer) ) ;
        memset(buffer, 0, sizeof(buffer) ) ;
    

    Because the last memset address THE STACK using STR (not STRB), it will indeed overwrite one byte...! Tested on CM0 (LPC1114) device.
    Note that making the buffer a power of 2 will solve the problem (of course...).

  • "Because the last memset address THE STACK using STR (not STRB), it will indeed overwrite one byte...! Tested on CM0 (LPC1114) device."

    Is that actually a problem?

    <speculate>
    The compiler would probably have aligned the 7 byte array to a multiple of 4 (or is it 8?). It knows what it has done, so the compiler writer's logic might be that clearing an extra (unused) byte should not be an issue and would be a faster operation.
    </speculate>

  • Can you post a more complete sample? Because the one you posted is too short, it doesn't prove anything.
    Keep in mind that at high optimization levels the compiler can transform code in non-obvious ways. You can begin to blame the compiler only when your program output is incorrect.

  • Please ignore this post. It seems that last byte is cleared somewhere else.

    0x00013F40 9504 STR r5,[sp,#0x10]
    0x00013F42 9505 STR r5,[sp,#0x14] 71: if(ReadBlockFromSRAM(data, addr, sizeof(data))) 72: {
    0x00013F44 2209 MOVS r2,#0x09
    0x00013F46 4631 MOV r1,r6
    0x00013F48 A804 ADD r0,sp,#0x10
    0x00013F4A 9506 STR r5,[sp,#0x18] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    0x00013F4C F7FDFB82 BL.W ReadBlockFromSRAM (0x00011654)
    0x00013F50 B150 CBZ r0,0x00013F68 73: if(Err++>10) break;

    I am not allowed to erase whole thread. Looks strange, that 4 bytes are cleared but is seems OK.

  • Here is what my compiler (

    IDE-Version:
    µVision V4.14.4.0
    Copyright (C) 2010 KEIL(TM) Tools by ARM
    
    License Information:
    tamir michael
    cmc
    LIC=WHY4D-85I2L-ZC4J9-244N7-FB8KA-HBEDB
    
    Tool Version Numbers:
    Toolchain:        RealView MDK-ARM  Version: 4.14
    Middleware:        RL-ARM Real-Time Library Version V4.13
    Toolchain Path:    BIN40\ 
    C Compiler:         Armcc.Exe       V4.1.0.567
    Assembler:          Armasm.Exe       V4.1.0.567
    Linker/Locator:     ArmLink.Exe       V4.1.0.567
    Librarian:             ArmAr.Exe       V4.1.0.567
    Hex Converter:      FromElf.Exe       V4.1.0.567
    CPU DLL:               SARMCM3.DLL       V4.14
    Dialog DLL:         DARMP1.DLL       V1.20.0.4
    Target DLL:             BIN\UL2CM3.DLL       V1.80
    Dialog DLL:         TARMP1.DLL       V1.20.0.3
    
    

    ) has generated:

       176: int32_t main(void)
    0x00001BFA BD70      POP      {r4-r6,pc}
       177: {
       178:     int8_t buffer[7] ;
       179:
    0x00001BFC B50E      PUSH     {r1-r3,lr}
       180:     memset(buffer, 4, sizeof(buffer) ) ;
    0x00001BFE 2204      MOVS     r2,#0x04
    0x00001C00 2107      MOVS     r1,#0x07
    0x00001C02 A801      ADD      r0,sp,#0x04
    0x00001C04 F001F9F8  BL.W     __aeabi_memset (0x00002FF8)
       181:     memset(buffer, 1, sizeof(buffer) ) ;
    0x00001C08 2201      MOVS     r2,#0x01
    0x00001C0A 2107      MOVS     r1,#0x07
    0x00001C0C A801      ADD      r0,sp,#0x04
    0x00001C0E F001F9F3  BL.W     __aeabi_memset (0x00002FF8)
       182:     memset(buffer, 0, sizeof(buffer) ) ;
       183:         // uint32_t i, j, k, l_start_time ;
       184:
       185: #ifdef WDT_ENABLED
       186:     wdt_init(WDT_PERIOD_10_SECONDS) ;
       187: #endif
       188:
    0x00001C12 2000      MOVS     r0,#0x00
    0x00001C14 9001      STR      r0,[sp,#0x04]
    0x00001C16 9002      STR      r0,[sp,#0x08]
       189:     remap_interrupt_vectors() ;
    

  • The compiler would probably have aligned the 7 byte array to a multiple of 4 (or is it 8?). It knows what it has done, so the compiler writer's logic might be that clearing an extra (unused) byte should not be an issue and would be a faster operation.

    Make perfect sense, but the behavior I observed with the little sample I posted is as follows:

    First, 7 times 4 is written to the buffer (not altering the last, 8th byte).
    Then, 7 (not 8!) 1 is written to the buffer (not altering the last, 8th byte).
    Finally, clearing 8 (not 7) bytes to 0.
    Why the difference?

    Is this still consistent? I am not sure this is a tool chain issue but it looks strange to me.

  • It could be that the last byte is indeed allocated and that no problem really exists. Actually, it smells like it - but further tests are needed to make sure nothing is placed at the 8th byte that is overwritten!

  • Is this still consistent? I am not sure this is a tool chain issue but it looks strange to me.

    It's not a problem at all. As long as the program output is correct, there is nothing to worry about. You will discover more strange things when studying compiler output as compilers perform more advanced optimizations with time. It only becomes a problem when a program malfunctions at high optimization levels and you suspect a compiler bug: pinpointing the bug can be extremely hard.

  • Tamir:
    1) You shouldn't post your license key.

    2) memset with value zero may be recognized by the compiler and handled specially, since the value zero is very special indeed.

  • Just for cleaning things up. I am debugging I2C peripheral. And the I2C transfer fails only when transfer size is 9 bytes long together with release build. Debug build is 100% OK.

    My failure is to suspect compiler that it is generating bad code. The most probable solution is that release code is somewhere faster.

    Generally I love failures that manifests only in release builds. They are very hard to find.