This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

STM32F0 uint32_t access across boundary

Hi

I am using MDK5.11a on STM32F050. I come across a hard fault, which I eventually find out it is the following construct that causes the problem.

The following is just to demonstrate the fault.

uint8_t array[30] ;
uint32_t *intptr ;
uint8_t i ;
.....
.....
for(i = 0 ; i <= 10 ; i++)
{ intptr = *(uint32_t*)&array[i]) ;

// then access via the integer pointer if(*intptr == 0) {.... }
}

sure, sooner or later, it crashes with hard fault.

My problem in using this approach to access 32 bit is that I received a stream of data from the USART, and put into the array. Depending on the header bytes, somewhere down the stream, the data bytes can either be interpreted as uint16_t or uint32_t, so accessing the 1st byte of the uint32_t by casting its address to uint32_t* and accessing it as 32 bit may bear the problem if the 4 bytes are not aligned correctly, which in my case, is very likely.

The easy way is copy byte by byte (memcpy) into a uint32_t variable and then further processing the variable. But is there any easy way (compiler directives)? As the construct is C compliant, I would have thought that it is the work of the compiler to deal with this automatically without user knowledge about this cross boundary problem?

Rgds

Calvin

  • "As the construct is C compliant, I would have thought that it is the work of the compiler to deal with this automatically without user knowledge about this cross boundary problem?"

    Take a closer look at the C language standard, and you'll notice that it is the developer who is responsible for making sure that a type-cast pointer points to a memory block of the required alignmnet for the intended use.

    The compiler+linker are just responsible for making sure that all fields in a struct has the proper align, and that the size of the struct and align of the struct fulfills the hardest requirement of any member of the struct.

    A number of compilers have methods for using packed data, where the compiler is forced to assume the data is not aligned and so perform multiple shorter memory accesses to overcom alignment errors. But this gives larger and slower code, so is not something that should be generally used.

    PC programmers often gets hurt by align, since the Intel memory controller chips has hardware-accelerated handling of unaligned ata - i.e. the memory controller will automatically detect if a 32-bit read/write is unaligned and will then convert the 32-bit read into two 32-bit reads and a merge of the two valus read into the 32-bit value that was originally requested.

    Personally, I prefer to implement get16(), get32(), put16() and put32() instead of having the compiler pack/unpack - that makes it clearly visible where packed data is being used.

  • As the construct is C compliant,

    You wish. But wishing doesn't make it so.

    That construct clearly violates a clause of the C standard (C99 6.3.2.3p7), which makes it cause undefined behaviour. That's about as non-compliant as you can be in syntactically correct code.

    In a nutshell, every time you "have to" cast a pointer to another pointer type, you're almost certainly violating the C standard in some way, and you actually had better not do that.

  • Thank you for the clear explanation from you guys. I did not aware of the alignment also applies to primitive data types. I thought it applies to structures.

    BTW, there is a 'directive/modifier' that can apply to this use.

    For the sake of testing, I create the following snippet:

    .....
    volatile uint32_t variable ;

    static void testptr(void)
    { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ; uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++) { variable = *(__packed uint32_t *)&array[i] ; }
    }

    ........

    The *(__packed uint32_t *)&array[i] typecasting did the job.

    Funny enough, this construct is not required for M3/M4 core, but only M0. That is, when my previous code snippet is compiled and test with M3/M4 code, it works straight away without hard fault interrupt.

    I think the compiler has already implemented this feature for M3/4 but miss out for M0.

    Rgds

    Calvin

  • "I think the compiler has already implemented this feature for M3/4 but miss out for M0."

    How/why would the compiler do that? If it would be preferred for the compiler to produce code that doesn't care about alignment, then there would be no reason for having any pragma or other way to specify that data is "packed".

    But the problem here is that code written to work with any alignment is larger and slower. So you do not want a compiler that does something as silly as that.

    Some memory controllers can mask alignment issues by adding a second memory access and then merge parts of the first and second memory access before returning a value to the processor core. A memory controller has the ability to do this without adding extra code size and only adding extra performance loss from the extra memory cycles when the data is unaligned. The disadvantage is a memory controller that hides incorrectly aligned data doesn't let the developer know when they have done something really stupid and managed to get lots of their data unaligned - the developer will just have to try to figure out why the program runs slower than expected.

  • So: don't do that, then!

    As already mentioned, make yourself a get16() and a get32() function to extract bytes from the stream & build them back into 16- or 32-bit words.

    That will also deal with the possibility that the stream has the wrong byte ordering...

    Google "serialisation"...

  • Hi all

    I take your point, and will be cautious about this.

    I come from 8 bit MCU world, so this is the area that I need to be aware of.

    As a matter of interest, I did the test on Keil MDK5.11a simulator for STM32F030 and STM32F401RE. For both cases either with or without __packed. For STM32F4, both code snippet work fine without hard fault crash. But for STM32F0, the one without __packed crashes.

    I am not familiar with ARM architecture and assembly but I manage to generate their corresponding assembly code for your reference. May be you guys are interested to explain why this is the case.

    (note: with or without __packed
    
    
    volatile uint32_t variable ;
    
    static void testptr(void)
    { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ;
      uint8_t i ; for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++)
      { variable = *(__packed uint32_t *)&array[i] ;
      }
    }
    
    --------------
    M0 without __packed
    
                      testptr PROC
    ;;;167
    ;;;168    static void testptr(void)
    000000  2000              MOVS     r0,#0
    ;;;169    { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ;
    ;;;170      uint8_t i ;
                for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++)
    000002  e005              B        |L8.16|
                      |L8.4|
    ;;;171      { variable = *(uint32_t *)&array[i] ;
    000004  4904              LDR      r1,|L8.24|
    000006  5809              LDR      r1,[r1,r0]
    000008  4a04              LDR      r2,|L8.28|
    00000a  6011              STR      r1,[r2,#0]  ; variable
    00000c  1c41              ADDS     r1,r0,#1              ;170
    00000e  b2c8              UXTB     r0,r1                 ;170
                      |L8.16|
    000010  280c              CMP      r0,#0xc               ;170
    000012  d9f7              BLS      |L8.4|
    ;;;172      }
    ;;;173    }
    000014  4770              BX       lr
    ;;;174
                              ENDP
    
    M0 with __packed
    
                      testptr PROC
    ;;;167
    ;;;168    static void testptr(void)
    000000  b510              PUSH     {r4,lr}
    ;;;169    { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ;
    ;;;170      uint8_t i ;
                for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++)
    000002  2400              MOVS     r4,#0
    000004  e007              B        |L8.22|
                      |L8.6|
    ;;;171      { variable = *(__packed uint32_t *)&array[i] ;
    000006  4905              LDR      r1,|L8.28|
    000008  1908              ADDS     r0,r1,r4
    00000a  f7fffffe          BL       __aeabi_uread4
    00000e  4904              LDR      r1,|L8.32|
    000010  6008              STR      r0,[r1,#0]  ; variable
    000012  1c60              ADDS     r0,r4,#1              ;170
    000014  b2c4              UXTB     r4,r0                 ;170
                      |L8.22|
    000016  2c0c              CMP      r4,#0xc               ;170
    000018  d9f5              BLS      |L8.6|
    ;;;172      }
    ;;;173    }
    00001a  bd10              POP      {r4,pc}
    ;;;174
                              ENDP
    
    M4 without __packed
                      testptr PROC
    ;;;81
    ;;;82     static void testptr(void)
    0000f2  2000              MOVS     r0,#0
    ;;;83     { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ;
    ;;;84       uint8_t i ;
                for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++)
    0000f4  e005              B        |L1.258|
                      |L1.246|
    ;;;85       { variable = *(uint32_t *)&array[i] ;
    0000f6  4936              LDR      r1,|L1.464|
    0000f8  5809              LDR      r1,[r1,r0]
    0000fa  4a36              LDR      r2,|L1.468|
    0000fc  6011              STR      r1,[r2,#0]  ; variable
    0000fe  1c41              ADDS     r1,r0,#1              ;84
    000100  b2c8              UXTB     r0,r1                 ;84
                      |L1.258|
    000102  280c              CMP      r0,#0xc               ;84
    000104  d9f7              BLS      |L1.246|
    ;;;86       }
    ;;;87     }
    000106  4770              BX       lr
    ;;;88
                              ENDP
    
    M4 with __packed
    
                      testptr PROC
    ;;;81
    ;;;82     static void testptr(void)
    0000f2  2000              MOVS     r0,#0
    ;;;83     { static const uint8_t array[] = {0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff} ;
    ;;;84       uint8_t i ;
                for(i = 0 ; i <= sizeof(array) - sizeof(uint32_t) ; i++)
    0000f4  e006              B        |L1.260|
                      |L1.246|
    ;;;85       { variable = *(__packed uint32_t *)&array[i] ;
    0000f6  4937              LDR      r1,|L1.468|
    0000f8  4401              ADD      r1,r1,r0
    0000fa  6809              LDR      r1,[r1,#0]
    0000fc  4a36              LDR      r2,|L1.472|
    0000fe  6011              STR      r1,[r2,#0]  ; variable
    000100  1c41              ADDS     r1,r0,#1              ;84
    000102  b2c8              UXTB     r0,r1                 ;84
                      |L1.260|
    000104  280c              CMP      r0,#0xc               ;84
    000106  d9f6              BLS      |L1.246|
    ;;;86       }
    ;;;87     }
    000108  4770              BX       lr
    ;;;88
                              ENDP
    
    
    ------------
    

    Rgds

    Calvin

  • This is a bad habit that you (and many others) just happened to get away with in the 8-bit world!

    "May be you guys are interested to explain why this is the case"

    As already noted, the language specification states that the behaviour is undefined - so there doesn't have to be any rhyme nor reason to it. There isn't really any benefit to analysing it - just don't do it!

    It is certainly not something that you should rely upon!

  • Google "Cortex M4 unaligned access" if you want some information.

    Like:
    infocenter.arm.com/.../index.jsp

    As already mentioned, some processors have hardware to handle unaligned access which means the compiler need not do black magic. So no extra code space bloat. But still slower than aligned access because the hardware must still perform multiple memory accesses to get the upper and lower parts of the integer to merge.

    For the M0, the compiler did call a helper function to do the 32-bit unaligned read.

    For the M4, there was no need for a helper function since a number of M4 instructions can manage even for unaligned access.

  • "This is a bad habit that you (and many others) just happened to get away with in the 8-bit world!"

    The first time I did program on Sun workstations I couldn't understand why my program failed with "Bus error" when the code could be compiled and run without any problems on a PC. Not all programs that can be compiled and run are actually correct with regards to the C language standard. It's really bad to rely on undefined behavior.

  • And a program which can be compiled and run and is "correct" with regards to the C language standard may not do what the programmer wanted.

    The classic example being:

    if( x = 1 ) ... ;
    

    When what the programmer wanted was

    if( x == 1 ) ... ;