This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

CM4: Write buffer with enabled MPU

Hello,

I have a question regarding Memory protection unit on Cortex M4 (STM32F3 MCU). This is pretty simple single core MCU without caches. I implemented MPU based on instructions in Definitive guide to the ARM Cortex-M4. It is stated there that the bufferable attribute of the memory, if it is defined by the MPU and the MPU is enabled, has higher priority than the default condition.

So, I defined a peripheral region (addresses from 0x40000000 - 0x5FFFFFFF) as a separate MPU region (full access, execute never) with bufferable attribute set. Is there any chance to see or to distinguish the behaviour if I set bufferable attribute or not?

The same for internal SRAM. First of all, is SRAM bufferable at all? And how to see difference if I set this attribute for SRAM or not?

There is also a prefetch block which is used for instruction fetches over the ICode bus. Is this somehow connected with cacheable attribute. Do I have to define a cacheable attribute for Flash, if it is coverd by the MPU?

Thank you in advance,

Matic

  • For Cortex-M4, the bufferable attribute is observable in two aspects:

    - performance : bufferable write to a peripheral register is faster than non-bufferable write (if non-bufferable, the processor need to write for the write to complete before next instruction can execute).

    - bus fault : if accessing a invalid peripheral address, you might be able to general a bus fault. A bus error for a bufferable write is asynchronous (imprecise), and for non-bufferable write, it is synchronous (precise). The bus fault status register will have corresponding status flags to indicate if the fault is synchronous or asynchronous.

    For SRAM, you might still be able to observe the performance difference but the different could be smaller (in terms of clock cycle counts). But the SRAM accesses will not generate error so you cannot observe bus error.

    For flash accesses, I don't know if STM32's flash access accelerator will utilize the cacheable attribute or not. (The attribute is exported to the system but the system might not utilise it). Potentially you can try different setting and see if it make a different in the performance. Alternatively you can ask ST (e.g. posting the question in STM32 forum : https://community.st.com/community/stm32-community/stm32-forum/content)

    regards,

    Joseph

  • Dear Mr. Yiu.

    I really appreciate your help...

    I asked this, because I found in your book (Def. Guide to CM-3 and CM4) , page 364, table 11.10, that the commonly used memory attributes for internal SRAM are "cacheable" and "shareable". We use STMF303 MCU (based on CM-4), which doesn't have flash accelerator. I was curious, why you did not suggest to make internal SRAM region bufferable (according to that table). I thought that write buffer does not improve write access to RAM, but if I understand your previous reply correctly, there could be a small difference in performance anyway.

    Best regards,

    Matic

  • For normal memory, the interpretation of TEX:S:C:B is a bit different (a bit confusing I know). When C is 1 and B is 0, effectively the memory is setup as Write Through cacheable. In this case, from memory, the internal write buffer is used (in other word, the write buffer is used if either C or B is set to 1). However, as the memory attribute from MPU is exported to the bus, and might or might not be used by the design of ST's SRAM interface, there might be performance different.

    I admit that using WT for that example is a bit of oversight - in processor with advanced memory systems (e.g. Cortex-M7) WB should give better performance. You can setup the SRAM as Write Back cacheable (C=1, B=1) and see if there is any performance different.

    From http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16220.html

      TEX:S:C:B   Description                      MEMATTRS[1:0]:HPROTS[3:2]
    
      000 0 0 0   Strongly Ordered                     10           00
      000 0 0 1   Device, Shareable                    10           01
      000 0 1 0   WT, Non-shareable                    01           10
      000 0 1 1   WB, Non-shareable                    01           11
      000 1 0 0   Strongly Ordered                     10           00
      000 1 0 1   Device, Shareable                    10           01
      000 1 1 0   WT, Shareable                        11           10
      000 1 1 1   WB, Shareable                        11           11
      001 0 0 0   Normal Non-cacheable, Non-shareable  00           10
      001 0 0 1   Reserved                             00           01
      001 0 1 0   Implementation Defined               10           10
      001 0 1 1   WBWA, Non-shareable                  10           11
      001 1 0 0   Normal non-cacheable, Shareable      10           10
      001 1 0 1   Reserved                             10           01
      001 1 1 0   Implementation Defined               10           10
      001 1 1 1   WBWA, Shareable                      10           11
      010 0 0 0   Device, Non-shareable                00           01 *
      010 0 0 1   Reserved                             00           01
      010 0 1 0   Reserved                             00           10
      010 0 1 1   Reserved                             00           11
      010 1 0 0   Device, Non-shareable                00           01 *
      010 1 0 1   Reserved                             10           01
      010 1 1 0   Reserved                             10           10
      010 1 1 1   Reserved                             10           11
      011 0 0 0   Reserved                             00           00
      011 0 0 1   Reserved                             00           01
      011 0 1 0   Reserved                             00           10
      011 0 1 1   Reserved                             00           11
      011 1 0 0   Reserved                             10           00
      011 1 0 1   Reserved                             10           01
      011 1 1 0   Reserved                             10           10
      011 1 1 1   Reserved                             10           11
      100 0 x x   Normal Non-cacheable, Non-shareable  00           10
      100 1 x x   Normal Non-cacheable, Shareable      10           10
      101 0 x x   WBWA, Non-shareable                  00           11
      101 1 x x   WBWA, Shareable                      10           11
      110 0 x x   WT, Non-shareable                    01           10
      110 1 x x   WT, Shareable                        11           10
      111 0 x x   WB, Non-shareable                    01           11
      111 1 x x   WB, Shareable                        11           11
    

    where

      WT = Normal Cacheable, Write-Through, allocate on read miss
      WB = Normal Cacheable, Write-Back, allocate on read miss
      WBWA = Normal Cacheable, Write-Back, allocate on read and write miss
    

    regards,

    Joseph

  • Thank you. Your response is very valuable to me.

    I did a bit of testing (measured CPU clocks via debugger for 500 program cycles):

    1. Write Buffer disabled (DISDEFWBUF set)  and MPU not used: 1.696.833

    2. Write Buffer enabled and MPU not used: 1.676.663

    3. Write Buffer enabled and MPU enabled with settings below:   1.695.451

    #define FLASH_MEMORY_ATT              (MPU_RASR_C_Msk)  

    #define PERIPHERALS_ATT                   (MPU_RASR_B_Msk | MPU_RASR_S_Msk)

    #define INT_SRAM_MEMORY_ATT        0

    4. Write Buffer enabled and MPU enabled with settings below:   1.676.825

    #define FLASH_MEMORY_ATT              (MPU_RASR_C_Msk)  

    #define PERIPHERALS_ATT                   (MPU_RASR_B_Msk | MPU_RASR_S_Msk)

    #define INT_SRAM_MEMORY_ATT        (MPU_RASR_C_Msk | MPU_RASR_S_Msk)

    5. Write Buffer enabled and MPU enabled with settings below:   1.676.832

    #define FLASH_MEMORY_ATT              (MPU_RASR_C_Msk)  

    #define PERIPHERALS_ATT                   (MPU_RASR_B_Msk | MPU_RASR_S_Msk)

    #define INT_SRAM_MEMORY_ATT        (MPU_RASR_B_Msk  |  MPU_RASR_C_Msk | MPU_RASR_S_Msk)

    So, there is no obvious differnece, if B attribute is 1 for SRAM, but there is some difference if none of C or B is set.

    Regards