I have found severe defect inside ARM cor Cortex-M3 compiler. It occurs only in release build.
And this is compilation result. Only 8 bytes is cleared, by using STR.
0x00013F06 F88D5008 STRB r5,[sp,#0x08] 23: UINT8 data[9];
64: continue; 65: } 66: 0x00013F42 E00F B 0x00013F64
67: memset(data,0, sizeof(data)); 70: 0x00013F44 9504 STR r5,[sp,#0x10] 0x00013F46 9505 STR r5,[sp,#0x14]
71: if(ReadBlockFromSRAM(data, addr, sizeof(data))) 72: { 0x00013F48 2209 MOVS r2,#0x09
=============================================================
When I use: x2 = sizeof(data); memset(data,0,x2);
It works fine.
int8_t buffer[7] ; memset(buffer, 4, sizeof(buffer) ) ; memset(buffer, 1, sizeof(buffer) ) ; memset(buffer, 0, sizeof(buffer) ) ;
Because the last memset address THE STACK using STR (not STRB), it will indeed overwrite one byte...! Tested on CM0 (LPC1114) device. Note that making the buffer a power of 2 will solve the problem (of course...).
"Because the last memset address THE STACK using STR (not STRB), it will indeed overwrite one byte...! Tested on CM0 (LPC1114) device."
Is that actually a problem?
<speculate> The compiler would probably have aligned the 7 byte array to a multiple of 4 (or is it 8?). It knows what it has done, so the compiler writer's logic might be that clearing an extra (unused) byte should not be an issue and would be a faster operation. </speculate>
Can you post a more complete sample? Because the one you posted is too short, it doesn't prove anything. Keep in mind that at high optimization levels the compiler can transform code in non-obvious ways. You can begin to blame the compiler only when your program output is incorrect.
Please ignore this post. It seems that last byte is cleared somewhere else.
0x00013F40 9504 STR r5,[sp,#0x10] 0x00013F42 9505 STR r5,[sp,#0x14] 71: if(ReadBlockFromSRAM(data, addr, sizeof(data))) 72: { 0x00013F44 2209 MOVS r2,#0x09 0x00013F46 4631 MOV r1,r6 0x00013F48 A804 ADD r0,sp,#0x10 0x00013F4A 9506 STR r5,[sp,#0x18] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 0x00013F4C F7FDFB82 BL.W ReadBlockFromSRAM (0x00011654) 0x00013F50 B150 CBZ r0,0x00013F68 73: if(Err++>10) break;
I am not allowed to erase whole thread. Looks strange, that 4 bytes are cleared but is seems OK.
Here is what my compiler (
IDE-Version: µVision V4.14.4.0 Copyright (C) 2010 KEIL(TM) Tools by ARM License Information: tamir michael cmc LIC=WHY4D-85I2L-ZC4J9-244N7-FB8KA-HBEDB Tool Version Numbers: Toolchain: RealView MDK-ARM Version: 4.14 Middleware: RL-ARM Real-Time Library Version V4.13 Toolchain Path: BIN40\ C Compiler: Armcc.Exe V4.1.0.567 Assembler: Armasm.Exe V4.1.0.567 Linker/Locator: ArmLink.Exe V4.1.0.567 Librarian: ArmAr.Exe V4.1.0.567 Hex Converter: FromElf.Exe V4.1.0.567 CPU DLL: SARMCM3.DLL V4.14 Dialog DLL: DARMP1.DLL V1.20.0.4 Target DLL: BIN\UL2CM3.DLL V1.80 Dialog DLL: TARMP1.DLL V1.20.0.3
) has generated:
176: int32_t main(void) 0x00001BFA BD70 POP {r4-r6,pc} 177: { 178: int8_t buffer[7] ; 179: 0x00001BFC B50E PUSH {r1-r3,lr} 180: memset(buffer, 4, sizeof(buffer) ) ; 0x00001BFE 2204 MOVS r2,#0x04 0x00001C00 2107 MOVS r1,#0x07 0x00001C02 A801 ADD r0,sp,#0x04 0x00001C04 F001F9F8 BL.W __aeabi_memset (0x00002FF8) 181: memset(buffer, 1, sizeof(buffer) ) ; 0x00001C08 2201 MOVS r2,#0x01 0x00001C0A 2107 MOVS r1,#0x07 0x00001C0C A801 ADD r0,sp,#0x04 0x00001C0E F001F9F3 BL.W __aeabi_memset (0x00002FF8) 182: memset(buffer, 0, sizeof(buffer) ) ; 183: // uint32_t i, j, k, l_start_time ; 184: 185: #ifdef WDT_ENABLED 186: wdt_init(WDT_PERIOD_10_SECONDS) ; 187: #endif 188: 0x00001C12 2000 MOVS r0,#0x00 0x00001C14 9001 STR r0,[sp,#0x04] 0x00001C16 9002 STR r0,[sp,#0x08] 189: remap_interrupt_vectors() ;
The compiler would probably have aligned the 7 byte array to a multiple of 4 (or is it 8?). It knows what it has done, so the compiler writer's logic might be that clearing an extra (unused) byte should not be an issue and would be a faster operation.
Make perfect sense, but the behavior I observed with the little sample I posted is as follows:
First, 7 times 4 is written to the buffer (not altering the last, 8th byte). Then, 7 (not 8!) 1 is written to the buffer (not altering the last, 8th byte). Finally, clearing 8 (not 7) bytes to 0. Why the difference?
Is this still consistent? I am not sure this is a tool chain issue but it looks strange to me.
It could be that the last byte is indeed allocated and that no problem really exists. Actually, it smells like it - but further tests are needed to make sure nothing is placed at the 8th byte that is overwritten!
It's not a problem at all. As long as the program output is correct, there is nothing to worry about. You will discover more strange things when studying compiler output as compilers perform more advanced optimizations with time. It only becomes a problem when a program malfunctions at high optimization levels and you suspect a compiler bug: pinpointing the bug can be extremely hard.
Tamir: 1) You shouldn't post your license key.
2) memset with value zero may be recognized by the compiler and handled specially, since the value zero is very special indeed.
Per,
Oops...
Just for cleaning things up. I am debugging I2C peripheral. And the I2C transfer fails only when transfer size is 9 bytes long together with release build. Debug build is 100% OK.
My failure is to suspect compiler that it is generating bad code. The most probable solution is that release code is somewhere faster.
Generally I love failures that manifests only in release builds. They are very hard to find.