We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi
I am using MDK5.11a on STM32F050. I come across a hard fault, which I eventually find out it is the following construct that causes the problem.
The following is just to demonstrate the fault.
uint8_t array[30] ; uint32_t *intptr ; uint8_t i ; ..... ..... for(i = 0 ; i <= 10 ; i++) { intptr = *(uint32_t*)&array[i]) ;
// then access via the integer pointer if(*intptr == 0) {.... } }
sure, sooner or later, it crashes with hard fault.
My problem in using this approach to access 32 bit is that I received a stream of data from the USART, and put into the array. Depending on the header bytes, somewhere down the stream, the data bytes can either be interpreted as uint16_t or uint32_t, so accessing the 1st byte of the uint32_t by casting its address to uint32_t* and accessing it as 32 bit may bear the problem if the 4 bytes are not aligned correctly, which in my case, is very likely.
The easy way is copy byte by byte (memcpy) into a uint32_t variable and then further processing the variable. But is there any easy way (compiler directives)? As the construct is C compliant, I would have thought that it is the work of the compiler to deal with this automatically without user knowledge about this cross boundary problem?
Rgds
Calvin
So: don't do that, then!
As already mentioned, make yourself a get16() and a get32() function to extract bytes from the stream & build them back into 16- or 32-bit words.
That will also deal with the possibility that the stream has the wrong byte ordering...
Google "serialisation"...
Hi all
I take your point, and will be cautious about this.
I come from 8 bit MCU world, so this is the area that I need to be aware of.
As a matter of interest, I did the test on Keil MDK5.11a simulator for STM32F030 and STM32F401RE. For both cases either with or without __packed. For STM32F4, both code snippet work fine without hard fault crash. But for STM32F0, the one without __packed crashes.
I am not familiar with ARM architecture and assembly but I manage to generate their corresponding assembly code for your reference. May be you guys are interested to explain why this is the case.
(note: with or without __packed volatile uint32_t variable ; static void testptr(void) { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) { variable = *(__packed uint32_t *)&array[i] ; } } -------------- M0 without __packed testptr PROC ;;;167 ;;;168 static void testptr(void) 000000 2000 MOVS r0,#0 ;;;169 { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; ;;;170 uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) 000002 e005 B |L8.16| |L8.4| ;;;171 { variable = *(uint32_t *)&array[i] ; 000004 4904 LDR r1,|L8.24| 000006 5809 LDR r1,[r1,r0] 000008 4a04 LDR r2,|L8.28| 00000a 6011 STR r1,[r2,#0] ; variable 00000c 1c41 ADDS r1,r0,#1 ;170 00000e b2c8 UXTB r0,r1 ;170 |L8.16| 000010 280c CMP r0,#0xc ;170 000012 d9f7 BLS |L8.4| ;;;172 } ;;;173 } 000014 4770 BX lr ;;;174 ENDP M0 with __packed testptr PROC ;;;167 ;;;168 static void testptr(void) 000000 b510 PUSH {r4,lr} ;;;169 { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; ;;;170 uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) 000002 2400 MOVS r4,#0 000004 e007 B |L8.22| |L8.6| ;;;171 { variable = *(__packed uint32_t *)&array[i] ; 000006 4905 LDR r1,|L8.28| 000008 1908 ADDS r0,r1,r4 00000a f7fffffe BL __aeabi_uread4 00000e 4904 LDR r1,|L8.32| 000010 6008 STR r0,[r1,#0] ; variable 000012 1c60 ADDS r0,r4,#1 ;170 000014 b2c4 UXTB r4,r0 ;170 |L8.22| 000016 2c0c CMP r4,#0xc ;170 000018 d9f5 BLS |L8.6| ;;;172 } ;;;173 } 00001a bd10 POP {r4,pc} ;;;174 ENDP M4 without __packed testptr PROC ;;;81 ;;;82 static void testptr(void) 0000f2 2000 MOVS r0,#0 ;;;83 { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; ;;;84 uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) 0000f4 e005 B |L1.258| |L1.246| ;;;85 { variable = *(uint32_t *)&array[i] ; 0000f6 4936 LDR r1,|L1.464| 0000f8 5809 LDR r1,[r1,r0] 0000fa 4a36 LDR r2,|L1.468| 0000fc 6011 STR r1,[r2,#0] ; variable 0000fe 1c41 ADDS r1,r0,#1 ;84 000100 b2c8 UXTB r0,r1 ;84 |L1.258| 000102 280c CMP r0,#0xc ;84 000104 d9f7 BLS |L1.246| ;;;86 } ;;;87 } 000106 4770 BX lr ;;;88 ENDP M4 with __packed testptr PROC ;;;81 ;;;82 static void testptr(void) 0000f2 2000 MOVS r0,#0 ;;;83 { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; ;;;84 uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) 0000f4 e006 B |L1.260| |L1.246| ;;;85 { variable = *(__packed uint32_t *)&array[i] ; 0000f6 4937 LDR r1,|L1.468| 0000f8 4401 ADD r1,r1,r0 0000fa 6809 LDR r1,[r1,#0] 0000fc 4a36 LDR r2,|L1.472| 0000fe 6011 STR r1,[r2,#0] ; variable 000100 1c41 ADDS r1,r0,#1 ;84 000102 b2c8 UXTB r0,r1 ;84 |L1.260| 000104 280c CMP r0,#0xc ;84 000106 d9f6 BLS |L1.246| ;;;86 } ;;;87 } 000108 4770 BX lr ;;;88 ENDP ------------
This is a bad habit that you (and many others) just happened to get away with in the 8-bit world!
"May be you guys are interested to explain why this is the case"
As already noted, the language specification states that the behaviour is undefined - so there doesn't have to be any rhyme nor reason to it. There isn't really any benefit to analysing it - just don't do it!
It is certainly not something that you should rely upon!
Google "Cortex M4 unaligned access" if you want some information.
Like: infocenter.arm.com/.../index.jsp
As already mentioned, some processors have hardware to handle unaligned access which means the compiler need not do black magic. So no extra code space bloat. But still slower than aligned access because the hardware must still perform multiple memory accesses to get the upper and lower parts of the integer to merge.
For the M0, the compiler did call a helper function to do the 32-bit unaligned read.
For the M4, there was no need for a helper function since a number of M4 instructions can manage even for unaligned access.
"This is a bad habit that you (and many others) just happened to get away with in the 8-bit world!"
The first time I did program on Sun workstations I couldn't understand why my program failed with "Bus error" when the code could be compiled and run without any problems on a PC. Not all programs that can be compiled and run are actually correct with regards to the C language standard. It's really bad to rely on undefined behavior.
And a program which can be compiled and run and is "correct" with regards to the C language standard may not do what the programmer wanted.
The classic example being:
if( x = 1 ) ... ;
When what the programmer wanted was
if( x == 1 ) ... ;