Hello,
I have a question regarding Memory protection unit on Cortex M4 (STM32F3 MCU). This is pretty simple single core MCU without caches. I implemented MPU based on instructions in Definitive guide to the ARM Cortex-M4. It is stated there that the bufferable attribute of the memory, if it is defined by the MPU and the MPU is enabled, has higher priority than the default condition.
So, I defined a peripheral region (addresses from 0x40000000 - 0x5FFFFFFF) as a separate MPU region (full access, execute never) with bufferable attribute set. Is there any chance to see or to distinguish the behaviour if I set bufferable attribute or not?
The same for internal SRAM. First of all, is SRAM bufferable at all? And how to see difference if I set this attribute for SRAM or not?
There is also a prefetch block which is used for instruction fetches over the ICode bus. Is this somehow connected with cacheable attribute. Do I have to define a cacheable attribute for Flash, if it is coverd by the MPU?
Thank you in advance,
Matic
For Cortex-M4, the bufferable attribute is observable in two aspects:
- performance : bufferable write to a peripheral register is faster than non-bufferable write (if non-bufferable, the processor need to write for the write to complete before next instruction can execute).
- bus fault : if accessing a invalid peripheral address, you might be able to general a bus fault. A bus error for a bufferable write is asynchronous (imprecise), and for non-bufferable write, it is synchronous (precise). The bus fault status register will have corresponding status flags to indicate if the fault is synchronous or asynchronous.
For SRAM, you might still be able to observe the performance difference but the different could be smaller (in terms of clock cycle counts). But the SRAM accesses will not generate error so you cannot observe bus error.
For flash accesses, I don't know if STM32's flash access accelerator will utilize the cacheable attribute or not. (The attribute is exported to the system but the system might not utilise it). Potentially you can try different setting and see if it make a different in the performance. Alternatively you can ask ST (e.g. posting the question in STM32 forum : https://community.st.com/community/stm32-community/stm32-forum/content)
regards,
Joseph
Dear Mr. Yiu.
I really appreciate your help...
I asked this, because I found in your book (Def. Guide to CM-3 and CM4) , page 364, table 11.10, that the commonly used memory attributes for internal SRAM are "cacheable" and "shareable". We use STMF303 MCU (based on CM-4), which doesn't have flash accelerator. I was curious, why you did not suggest to make internal SRAM region bufferable (according to that table). I thought that write buffer does not improve write access to RAM, but if I understand your previous reply correctly, there could be a small difference in performance anyway.
Best regards,
For normal memory, the interpretation of TEX:S:C:B is a bit different (a bit confusing I know). When C is 1 and B is 0, effectively the memory is setup as Write Through cacheable. In this case, from memory, the internal write buffer is used (in other word, the write buffer is used if either C or B is set to 1). However, as the memory attribute from MPU is exported to the bus, and might or might not be used by the design of ST's SRAM interface, there might be performance different.
I admit that using WT for that example is a bit of oversight - in processor with advanced memory systems (e.g. Cortex-M7) WB should give better performance. You can setup the SRAM as Write Back cacheable (C=1, B=1) and see if there is any performance different.
From http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka16220.html
TEX:S:C:B Description MEMATTRS[1:0]:HPROTS[3:2] 000 0 0 0 Strongly Ordered 10 00 000 0 0 1 Device, Shareable 10 01 000 0 1 0 WT, Non-shareable 01 10 000 0 1 1 WB, Non-shareable 01 11 000 1 0 0 Strongly Ordered 10 00 000 1 0 1 Device, Shareable 10 01 000 1 1 0 WT, Shareable 11 10 000 1 1 1 WB, Shareable 11 11 001 0 0 0 Normal Non-cacheable, Non-shareable 00 10 001 0 0 1 Reserved 00 01 001 0 1 0 Implementation Defined 10 10 001 0 1 1 WBWA, Non-shareable 10 11 001 1 0 0 Normal non-cacheable, Shareable 10 10 001 1 0 1 Reserved 10 01 001 1 1 0 Implementation Defined 10 10 001 1 1 1 WBWA, Shareable 10 11 010 0 0 0 Device, Non-shareable 00 01 * 010 0 0 1 Reserved 00 01 010 0 1 0 Reserved 00 10 010 0 1 1 Reserved 00 11 010 1 0 0 Device, Non-shareable 00 01 * 010 1 0 1 Reserved 10 01 010 1 1 0 Reserved 10 10 010 1 1 1 Reserved 10 11 011 0 0 0 Reserved 00 00 011 0 0 1 Reserved 00 01 011 0 1 0 Reserved 00 10 011 0 1 1 Reserved 00 11 011 1 0 0 Reserved 10 00 011 1 0 1 Reserved 10 01 011 1 1 0 Reserved 10 10 011 1 1 1 Reserved 10 11 100 0 x x Normal Non-cacheable, Non-shareable 00 10 100 1 x x Normal Non-cacheable, Shareable 10 10 101 0 x x WBWA, Non-shareable 00 11 101 1 x x WBWA, Shareable 10 11 110 0 x x WT, Non-shareable 01 10 110 1 x x WT, Shareable 11 10 111 0 x x WB, Non-shareable 01 11 111 1 x x WB, Shareable 11 11
where
WT = Normal Cacheable, Write-Through, allocate on read miss WB = Normal Cacheable, Write-Back, allocate on read miss WBWA = Normal Cacheable, Write-Back, allocate on read and write miss
Thank you. Your response is very valuable to me.
I did a bit of testing (measured CPU clocks via debugger for 500 program cycles):
1. Write Buffer disabled (DISDEFWBUF set) and MPU not used: 1.696.833
2. Write Buffer enabled and MPU not used: 1.676.663
3. Write Buffer enabled and MPU enabled with settings below: 1.695.451
#define FLASH_MEMORY_ATT (MPU_RASR_C_Msk)
#define PERIPHERALS_ATT (MPU_RASR_B_Msk | MPU_RASR_S_Msk)
#define INT_SRAM_MEMORY_ATT 0
4. Write Buffer enabled and MPU enabled with settings below: 1.676.825
#define INT_SRAM_MEMORY_ATT (MPU_RASR_C_Msk | MPU_RASR_S_Msk)
5. Write Buffer enabled and MPU enabled with settings below: 1.676.832
#define INT_SRAM_MEMORY_ATT (MPU_RASR_B_Msk | MPU_RASR_C_Msk | MPU_RASR_S_Msk)
So, there is no obvious differnece, if B attribute is 1 for SRAM, but there is some difference if none of C or B is set.
Regards