We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Hi
I am using MDK5.11a on STM32F050. I come across a hard fault, which I eventually find out it is the following construct that causes the problem.
The following is just to demonstrate the fault.
uint8_t array[30] ; uint32_t *intptr ; uint8_t i ; ..... ..... for(i = 0 ; i <= 10 ; i++) { intptr = *(uint32_t*)&array[i]) ;
// then access via the integer pointer if(*intptr == 0) {.... } }
sure, sooner or later, it crashes with hard fault.
My problem in using this approach to access 32 bit is that I received a stream of data from the USART, and put into the array. Depending on the header bytes, somewhere down the stream, the data bytes can either be interpreted as uint16_t or uint32_t, so accessing the 1st byte of the uint32_t by casting its address to uint32_t* and accessing it as 32 bit may bear the problem if the 4 bytes are not aligned correctly, which in my case, is very likely.
The easy way is copy byte by byte (memcpy) into a uint32_t variable and then further processing the variable. But is there any easy way (compiler directives)? As the construct is C compliant, I would have thought that it is the work of the compiler to deal with this automatically without user knowledge about this cross boundary problem?
Rgds
Calvin
"As the construct is C compliant, I would have thought that it is the work of the compiler to deal with this automatically without user knowledge about this cross boundary problem?"
Take a closer look at the C language standard, and you'll notice that it is the developer who is responsible for making sure that a type-cast pointer points to a memory block of the required alignmnet for the intended use.
The compiler+linker are just responsible for making sure that all fields in a struct has the proper align, and that the size of the struct and align of the struct fulfills the hardest requirement of any member of the struct.
A number of compilers have methods for using packed data, where the compiler is forced to assume the data is not aligned and so perform multiple shorter memory accesses to overcom alignment errors. But this gives larger and slower code, so is not something that should be generally used.
PC programmers often gets hurt by align, since the Intel memory controller chips has hardware-accelerated handling of unaligned ata - i.e. the memory controller will automatically detect if a 32-bit read/write is unaligned and will then convert the 32-bit read into two 32-bit reads and a merge of the two valus read into the 32-bit value that was originally requested.
Personally, I prefer to implement get16(), get32(), put16() and put32() instead of having the compiler pack/unpack - that makes it clearly visible where packed data is being used.
As the construct is C compliant,
You wish. But wishing doesn't make it so.
That construct clearly violates a clause of the C standard (C99 6.3.2.3p7), which makes it cause undefined behaviour. That's about as non-compliant as you can be in syntactically correct code.
In a nutshell, every time you "have to" cast a pointer to another pointer type, you're almost certainly violating the C standard in some way, and you actually had better not do that.
Thank you for the clear explanation from you guys. I did not aware of the alignment also applies to primitive data types. I thought it applies to structures.
BTW, there is a 'directive/modifier' that can apply to this use.
For the sake of testing, I create the following snippet:
..... volatile uint32_t variable ;
static void testptr(void) { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) { variable = *(__packed uint32_t *)&array[i] ; } }
........
The *(__packed uint32_t *)&array[i] typecasting did the job.
Funny enough, this construct is not required for M3/M4 core, but only M0. That is, when my previous code snippet is compiled and test with M3/M4 code, it works straight away without hard fault interrupt.
I think the compiler has already implemented this feature for M3/4 but miss out for M0.
"I think the compiler has already implemented this feature for M3/4 but miss out for M0."
How/why would the compiler do that? If it would be preferred for the compiler to produce code that doesn't care about alignment, then there would be no reason for having any pragma or other way to specify that data is "packed".
But the problem here is that code written to work with any alignment is larger and slower. So you do not want a compiler that does something as silly as that.
Some memory controllers can mask alignment issues by adding a second memory access and then merge parts of the first and second memory access before returning a value to the processor core. A memory controller has the ability to do this without adding extra code size and only adding extra performance loss from the extra memory cycles when the data is unaligned. The disadvantage is a memory controller that hides incorrectly aligned data doesn't let the developer know when they have done something really stupid and managed to get lots of their data unaligned - the developer will just have to try to figure out why the program runs slower than expected.
So: don't do that, then!
As already mentioned, make yourself a get16() and a get32() function to extract bytes from the stream & build them back into 16- or 32-bit words.
That will also deal with the possibility that the stream has the wrong byte ordering...
Google "serialisation"...
Hi all
I take your point, and will be cautious about this.
I come from 8 bit MCU world, so this is the area that I need to be aware of.
As a matter of interest, I did the test on Keil MDK5.11a simulator for STM32F030 and STM32F401RE. For both cases either with or without __packed. For STM32F4, both code snippet work fine without hard fault crash. But for STM32F0, the one without __packed crashes.
I am not familiar with ARM architecture and assembly but I manage to generate their corresponding assembly code for your reference. May be you guys are interested to explain why this is the case.
(note: with or without __packed volatile uint32_t variable ; static void testptr(void) { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) { variable = *(__packed uint32_t *)&array[i] ; } } -------------- M0 without __packed testptr PROC ;;;167 ;;;168 static void testptr(void) 000000 2000 MOVS r0,#0 ;;;169 { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; ;;;170 uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) 000002 e005 B |L8.16| |L8.4| ;;;171 { variable = *(uint32_t *)&array[i] ; 000004 4904 LDR r1,|L8.24| 000006 5809 LDR r1,[r1,r0] 000008 4a04 LDR r2,|L8.28| 00000a 6011 STR r1,[r2,#0] ; variable 00000c 1c41 ADDS r1,r0,#1 ;170 00000e b2c8 UXTB r0,r1 ;170 |L8.16| 000010 280c CMP r0,#0xc ;170 000012 d9f7 BLS |L8.4| ;;;172 } ;;;173 } 000014 4770 BX lr ;;;174 ENDP M0 with __packed testptr PROC ;;;167 ;;;168 static void testptr(void) 000000 b510 PUSH {r4,lr} ;;;169 { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; ;;;170 uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) 000002 2400 MOVS r4,#0 000004 e007 B |L8.22| |L8.6| ;;;171 { variable = *(__packed uint32_t *)&array[i] ; 000006 4905 LDR r1,|L8.28| 000008 1908 ADDS r0,r1,r4 00000a f7fffffe BL __aeabi_uread4 00000e 4904 LDR r1,|L8.32| 000010 6008 STR r0,[r1,#0] ; variable 000012 1c60 ADDS r0,r4,#1 ;170 000014 b2c4 UXTB r4,r0 ;170 |L8.22| 000016 2c0c CMP r4,#0xc ;170 000018 d9f5 BLS |L8.6| ;;;172 } ;;;173 } 00001a bd10 POP {r4,pc} ;;;174 ENDP M4 without __packed testptr PROC ;;;81 ;;;82 static void testptr(void) 0000f2 2000 MOVS r0,#0 ;;;83 { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; ;;;84 uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) 0000f4 e005 B |L1.258| |L1.246| ;;;85 { variable = *(uint32_t *)&array[i] ; 0000f6 4936 LDR r1,|L1.464| 0000f8 5809 LDR r1,[r1,r0] 0000fa 4a36 LDR r2,|L1.468| 0000fc 6011 STR r1,[r2,#0] ; variable 0000fe 1c41 ADDS r1,r0,#1 ;84 000100 b2c8 UXTB r0,r1 ;84 |L1.258| 000102 280c CMP r0,#0xc ;84 000104 d9f7 BLS |L1.246| ;;;86 } ;;;87 } 000106 4770 BX lr ;;;88 ENDP M4 with __packed testptr PROC ;;;81 ;;;82 static void testptr(void) 0000f2 2000 MOVS r0,#0 ;;;83 { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; ;;;84 uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) 0000f4 e006 B |L1.260| |L1.246| ;;;85 { variable = *(__packed uint32_t *)&array[i] ; 0000f6 4937 LDR r1,|L1.468| 0000f8 4401 ADD r1,r1,r0 0000fa 6809 LDR r1,[r1,#0] 0000fc 4a36 LDR r2,|L1.472| 0000fe 6011 STR r1,[r2,#0] ; variable 000100 1c41 ADDS r1,r0,#1 ;84 000102 b2c8 UXTB r0,r1 ;84 |L1.260| 000104 280c CMP r0,#0xc ;84 000106 d9f6 BLS |L1.246| ;;;86 } ;;;87 } 000108 4770 BX lr ;;;88 ENDP ------------
This is a bad habit that you (and many others) just happened to get away with in the 8-bit world!
"May be you guys are interested to explain why this is the case"
As already noted, the language specification states that the behaviour is undefined - so there doesn't have to be any rhyme nor reason to it. There isn't really any benefit to analysing it - just don't do it!
It is certainly not something that you should rely upon!
Google "Cortex M4 unaligned access" if you want some information.
Like: infocenter.arm.com/.../index.jsp
As already mentioned, some processors have hardware to handle unaligned access which means the compiler need not do black magic. So no extra code space bloat. But still slower than aligned access because the hardware must still perform multiple memory accesses to get the upper and lower parts of the integer to merge.
For the M0, the compiler did call a helper function to do the 32-bit unaligned read.
For the M4, there was no need for a helper function since a number of M4 instructions can manage even for unaligned access.
"This is a bad habit that you (and many others) just happened to get away with in the 8-bit world!"
The first time I did program on Sun workstations I couldn't understand why my program failed with "Bus error" when the code could be compiled and run without any problems on a PC. Not all programs that can be compiled and run are actually correct with regards to the C language standard. It's really bad to rely on undefined behavior.
And a program which can be compiled and run and is "correct" with regards to the C language standard may not do what the programmer wanted.
The classic example being:
if( x = 1 ) ... ;
When what the programmer wanted was
if( x == 1 ) ... ;