This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

CM4: Write buffer with enabled MPU

Hello,

I have a question regarding Memory protection unit on Cortex M4 (STM32F3 MCU). This is pretty simple single core MCU without caches. I implemented MPU based on instructions in Definitive guide to the ARM Cortex-M4. It is stated there that the bufferable attribute of the memory, if it is defined by the MPU and the MPU is enabled, has higher priority than the default condition.

So, I defined a peripheral region (addresses from 0x40000000 - 0x5FFFFFFF) as a separate MPU region (full access, execute never) with bufferable attribute set. Is there any chance to see or to distinguish the behaviour if I set bufferable attribute or not?

The same for internal SRAM. First of all, is SRAM bufferable at all? And how to see difference if I set this attribute for SRAM or not?

There is also a prefetch block which is used for instruction fetches over the ICode bus. Is this somehow connected with cacheable attribute. Do I have to define a cacheable attribute for Flash, if it is coverd by the MPU?

Thank you in advance,

Matic

Parents
  • For normal memory, the interpretation of TEX:S:C:B is a bit different (a bit confusing I know). When C is 1 and B is 0, effectively the memory is setup as Write Through cacheable. In this case, from memory, the internal write buffer is used (in other word, the write buffer is used if either C or B is set to 1). However, as the memory attribute from MPU is exported to the bus, and might or might not be used by the design of ST's SRAM interface, there might be performance different.

    I admit that using WT for that example is a bit of oversight - in processor with advanced memory systems (e.g. Cortex-M7) WB should give better performance. You can setup the SRAM as Write Back cacheable (C=1, B=1) and see if there is any performance different.

    From http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16220.html

      TEX:S:C:B   Description                      MEMATTRS[1:0]:HPROTS[3:2]
    
      000 0 0 0   Strongly Ordered                     10           00
      000 0 0 1   Device, Shareable                    10           01
      000 0 1 0   WT, Non-shareable                    01           10
      000 0 1 1   WB, Non-shareable                    01           11
      000 1 0 0   Strongly Ordered                     10           00
      000 1 0 1   Device, Shareable                    10           01
      000 1 1 0   WT, Shareable                        11           10
      000 1 1 1   WB, Shareable                        11           11
      001 0 0 0   Normal Non-cacheable, Non-shareable  00           10
      001 0 0 1   Reserved                             00           01
      001 0 1 0   Implementation Defined               10           10
      001 0 1 1   WBWA, Non-shareable                  10           11
      001 1 0 0   Normal non-cacheable, Shareable      10           10
      001 1 0 1   Reserved                             10           01
      001 1 1 0   Implementation Defined               10           10
      001 1 1 1   WBWA, Shareable                      10           11
      010 0 0 0   Device, Non-shareable                00           01 *
      010 0 0 1   Reserved                             00           01
      010 0 1 0   Reserved                             00           10
      010 0 1 1   Reserved                             00           11
      010 1 0 0   Device, Non-shareable                00           01 *
      010 1 0 1   Reserved                             10           01
      010 1 1 0   Reserved                             10           10
      010 1 1 1   Reserved                             10           11
      011 0 0 0   Reserved                             00           00
      011 0 0 1   Reserved                             00           01
      011 0 1 0   Reserved                             00           10
      011 0 1 1   Reserved                             00           11
      011 1 0 0   Reserved                             10           00
      011 1 0 1   Reserved                             10           01
      011 1 1 0   Reserved                             10           10
      011 1 1 1   Reserved                             10           11
      100 0 x x   Normal Non-cacheable, Non-shareable  00           10
      100 1 x x   Normal Non-cacheable, Shareable      10           10
      101 0 x x   WBWA, Non-shareable                  00           11
      101 1 x x   WBWA, Shareable                      10           11
      110 0 x x   WT, Non-shareable                    01           10
      110 1 x x   WT, Shareable                        11           10
      111 0 x x   WB, Non-shareable                    01           11
      111 1 x x   WB, Shareable                        11           11
    

    where

      WT = Normal Cacheable, Write-Through, allocate on read miss
      WB = Normal Cacheable, Write-Back, allocate on read miss
      WBWA = Normal Cacheable, Write-Back, allocate on read and write miss
    

    regards,

    Joseph

Reply
  • For normal memory, the interpretation of TEX:S:C:B is a bit different (a bit confusing I know). When C is 1 and B is 0, effectively the memory is setup as Write Through cacheable. In this case, from memory, the internal write buffer is used (in other word, the write buffer is used if either C or B is set to 1). However, as the memory attribute from MPU is exported to the bus, and might or might not be used by the design of ST's SRAM interface, there might be performance different.

    I admit that using WT for that example is a bit of oversight - in processor with advanced memory systems (e.g. Cortex-M7) WB should give better performance. You can setup the SRAM as Write Back cacheable (C=1, B=1) and see if there is any performance different.

    From http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16220.html

      TEX:S:C:B   Description                      MEMATTRS[1:0]:HPROTS[3:2]
    
      000 0 0 0   Strongly Ordered                     10           00
      000 0 0 1   Device, Shareable                    10           01
      000 0 1 0   WT, Non-shareable                    01           10
      000 0 1 1   WB, Non-shareable                    01           11
      000 1 0 0   Strongly Ordered                     10           00
      000 1 0 1   Device, Shareable                    10           01
      000 1 1 0   WT, Shareable                        11           10
      000 1 1 1   WB, Shareable                        11           11
      001 0 0 0   Normal Non-cacheable, Non-shareable  00           10
      001 0 0 1   Reserved                             00           01
      001 0 1 0   Implementation Defined               10           10
      001 0 1 1   WBWA, Non-shareable                  10           11
      001 1 0 0   Normal non-cacheable, Shareable      10           10
      001 1 0 1   Reserved                             10           01
      001 1 1 0   Implementation Defined               10           10
      001 1 1 1   WBWA, Shareable                      10           11
      010 0 0 0   Device, Non-shareable                00           01 *
      010 0 0 1   Reserved                             00           01
      010 0 1 0   Reserved                             00           10
      010 0 1 1   Reserved                             00           11
      010 1 0 0   Device, Non-shareable                00           01 *
      010 1 0 1   Reserved                             10           01
      010 1 1 0   Reserved                             10           10
      010 1 1 1   Reserved                             10           11
      011 0 0 0   Reserved                             00           00
      011 0 0 1   Reserved                             00           01
      011 0 1 0   Reserved                             00           10
      011 0 1 1   Reserved                             00           11
      011 1 0 0   Reserved                             10           00
      011 1 0 1   Reserved                             10           01
      011 1 1 0   Reserved                             10           10
      011 1 1 1   Reserved                             10           11
      100 0 x x   Normal Non-cacheable, Non-shareable  00           10
      100 1 x x   Normal Non-cacheable, Shareable      10           10
      101 0 x x   WBWA, Non-shareable                  00           11
      101 1 x x   WBWA, Shareable                      10           11
      110 0 x x   WT, Non-shareable                    01           10
      110 1 x x   WT, Shareable                        11           10
      111 0 x x   WB, Non-shareable                    01           11
      111 1 x x   WB, Shareable                        11           11
    

    where

      WT = Normal Cacheable, Write-Through, allocate on read miss
      WB = Normal Cacheable, Write-Back, allocate on read miss
      WBWA = Normal Cacheable, Write-Back, allocate on read and write miss
    

    regards,

    Joseph

Children
  • Thank you. Your response is very valuable to me.

    I did a bit of testing (measured CPU clocks via debugger for 500 program cycles):

    1. Write Buffer disabled (DISDEFWBUF set)  and MPU not used: 1.696.833

    2. Write Buffer enabled and MPU not used: 1.676.663

    3. Write Buffer enabled and MPU enabled with settings below:   1.695.451

    #define FLASH_MEMORY_ATT              (MPU_RASR_C_Msk)  

    #define PERIPHERALS_ATT                   (MPU_RASR_B_Msk | MPU_RASR_S_Msk)

    #define INT_SRAM_MEMORY_ATT        0

    4. Write Buffer enabled and MPU enabled with settings below:   1.676.825

    #define FLASH_MEMORY_ATT              (MPU_RASR_C_Msk)  

    #define PERIPHERALS_ATT                   (MPU_RASR_B_Msk | MPU_RASR_S_Msk)

    #define INT_SRAM_MEMORY_ATT        (MPU_RASR_C_Msk | MPU_RASR_S_Msk)

    5. Write Buffer enabled and MPU enabled with settings below:   1.676.832

    #define FLASH_MEMORY_ATT              (MPU_RASR_C_Msk)  

    #define PERIPHERALS_ATT                   (MPU_RASR_B_Msk | MPU_RASR_S_Msk)

    #define INT_SRAM_MEMORY_ATT        (MPU_RASR_B_Msk  |  MPU_RASR_C_Msk | MPU_RASR_S_Msk)

    So, there is no obvious differnece, if B attribute is 1 for SRAM, but there is some difference if none of C or B is set.

    Regards