Hi at all!
I'm working with TI DM3730 (CortexA8 inside) and an external mobile DDR-SDRAM.
The startup initialize MMU, L1 Cache and L2 Cache and Flow Prediction.
Tests with about 256MiB of data show some data loss when L2 Cache is enabled.
If L2 is disabled and only L1 I and D Cache is enabled, the data tests work properly.
Here is my MMU-Setting:
Device-Registers: AP = 3, mode = 2, TEX = 0, C = 0, B = 0
internal RAM: AP = 3, TEX = 0, C = 1, B = 0
external RAM: AP = 3, TEX = 0, C = 1, B = 0
all areas are mapped to domain 0 and all domains are set to client mode.
If the MMU-Settings for external RAM are set to TEX = 0, C = 0 and B = 0 (disable caching for external RAM), there is also no data loss and all work fine.
Anyone an idea?
Here is the complete code of initialization:
/**************************************************** * disable 'Instruction Cache' , 'Flow Prediction' * disable 'Data Cache' , 'MMU' * enable 'Alignment check' ****************************************************/ mrc p15, 0, r0, c1, c0, 1 // read auxiliary control register bic r0, r0, #(0x1 << 1) // disable L2-Cache mcr p15, 0, r0, c1, c0, 1 // write back mrc p15, 0, r0, c1, c0, 0 // read control register bic r0, r0, #(0x1 << 0) // disable MMU orr r0, r0, #(0x1 << 1) // enable Strict alignment fault checking bic r0, r0, #(0x1 << 2) // disable D-Cache bic r0, r0, #(0x1 << 11) // disable Flow prediction bic r0, r0, #(0x1 << 12) // disable I-Cache bic r0, r0, #(0x1 << 13) // Use Normal exception vector mcr p15, 0, r0, c1, c0, 0 // write back /**************************************************** * set 'Vectore Base Address' ****************************************************/ ldr r0, =dm3730_initVector mcr p15, 0, r0, c12, c0, 0 /**************************************************** * Invalidate 'Translation Table Base' (TLB) * Invalidate 'Instruction Cache' und * lösche 'branch target cache' ****************************************************/ mov r0, #0 mcr p15, 0, r0, c8, c7, 0 // Invalidate Inst-TLB und Data-TLB mcr p15, 0, r0, c7, c5, 0 // Invalidate all instruction caches and flushes branch target cache /**************************************************** * Cache Invalidation code -> * github.com/.../startup.s ****************************************************/ mrc p15, 1, r0, c0, c0, 1 // Read CLIDR ands r3, r0, #0x07000000 // Extract coherency level mov r3, r3, lsr #23 // Total cache levels << 1 beq processor_FirstBootEntry_cacheExit // If 0, no need to clean mov r8, #0 // R8 holds current cache level << 1 processor_FirstBootEntry_cacheLoop1: add r2, r8, r8, lsr #1 // R2 holds cache "Set" position mov r1, r0, lsr r2 // Bottom 3 bits are the Cache-type for this level and r1, r1, #7 // Isolate those lower 3 bits cmp r1, #2 blt processor_FirstBootEntry_cacheSkip // No cache or only instruction cache at this level mcr p15, 2, r8, c0, c0, 0 // Write the Cache Size selection register ISB // ISB to sync the change to the CacheSizeID reg mrc p15, 1, r1, c0, c0, 0 // Reads current Cache Size ID register and r2, r1, #7 // Extract the line length field add r2, r2, #4 // Add 4 for the line length offset (log2 16 bytes) movw r4, #0x3ff ands r4, r4, r1, LSR #3 // R4 is the max number on the way size (right aligned) clz r5, r4 // R5 is the bit position of the way size increment movw r6, #0x7FFF ands r6, r6, r1, LSR #13 // R6 is the max number of the index size (right aligned) processor_FirstBootEntry_cacheLoop2: mov r7, r4 // R7 working copy of the max way size (right aligned) processor_FirstBootEntry_cacheLoop3: orr r1, r8, r7, lsl r5 // Factor in the Way number and cache number into R1 orr r1, r1, r6, lsl r2 // Factor in the Set number mcr p15, 0, r1, c7, c6, 2 // Invalidate by Set/Way subs r7, r7, #1 // Decrement the Way number bge processor_FirstBootEntry_cacheLoop3 subs r6, r6, #1 // Decrement the Set number bge processor_FirstBootEntry_cacheLoop2 processor_FirstBootEntry_cacheSkip: add r8, r8, #2 // increment the cache number cmp r3, r8 bgt processor_FirstBootEntry_cacheLoop1 processor_FirstBootEntry_cacheExit: DSB /**************************************************** * MMU Settings ****************************************************/ // Hier Code kopieren vom Cache-Handling /* invalidate TLB */ mov r0, #0 mcr p15, 0, r0, c8, c7, 0 // set TLB for instruction and data invalid /* TLB Addr setzen */ mov r0, #0 mcr p15, 0, r0, c2, c0, 2 // set Table-Base-Control register to 0 ldr r0, =mmuL1PageTable // load addr of MMU-L1-PageTable mcr p15, 0, r0, c2, c0, 0 // set addr of MMU-L1-PageTable /**************************************************** * Setup domain control register - * Enable all domains to client mode ****************************************************/ movw r0, #0x5555 // setl all domains to client mode movt r0, #0x5555 mcr p15, 0, r0, c3, c0, 0 // write domain control register /**************************************************** * Write L2 Cache Auxiliary Control Register ****************************************************/ mov r0, #0 mcr p15, 1, r0, c9, c0, 2 #ifdef __ARM_NEON__ /**************************************************** * Enable NEON/VFP ****************************************************/ mrc p15, 0, r0, c1, c0, 2 // Read CP Access register orr r0, r0, #(0xF << 20) // Enable full access to NEON/VFP (Coprocessors 10 and 11) mcr p15, 0, r0, c1, c0, 2 // Write CP Access register mov r0, #0x40000000 // Switch on the VFP and NEON hardware vmsr FPEXC, r0 // Write FPEXC register, EN bit set #endif /**************************************************** * Enable MMU ****************************************************/ mrc p15, 0, r0, c1, c0, 0 // read Control Register orr r0, r0, #(0x1 << 0) // enable MMU mcr p15, 0, r0, c1, c0, 0 // write back /**************************************************** * Enable L2-Caches ****************************************************/ mrc p15, 0, r0, c1, c0, 1 // Lese Auxiliary Control Register orr r0, r0, #(0x1 << 1) // enable L2-Cache mcr p15, 0, r0, c1, c0, 1 // write back /**************************************************** * Enable Caches ****************************************************/ mrc p15, 0, r0, c1, c0, 0 // read Control Register orr r0, r0, #(0x1 << 12) // enable I-Cache orr r0, r0, #(0x1 << 2) // enable D-Cache orr r0, r0, #(0x1 << 11) // enable Flow prediction mcr p15, 0, r0, c1, c0, 0 // write back
After some additional researches I have solved the problem.
I replaced the code for cache-invalidation for Cortex-A8 with a specific Secure-Monitor-Call in DM3730 according to DM3730 Technical Manual (SPRUGN3R) on page 3670 26.4.1 Booting Overview. Since then all tests are successful.
New Cache-Invalidation Code:
/**************************************************** * Cache Invalidation code -> * über Secure-Monitor, siehe DM3739 Technical-Manual NDA (SPRUGN3R) * auf Seite 3670, 26.4.1 Booting Overview ****************************************************/ mov r12, #1 smc #0
Thanks again for all your help!
Greets
Andreas
Thanks you very much for that and congratulations on solving it. I'd have been pulling out my hair..
Hi Andres,
I would like to know why your first code for L2 cache invalidation could not work successfully.
Could you show the concrete codes which are executed by the system call?
Best regards,
Yasuhiko Koumoto.
Hi Yasuhiko,
this is a good question but unfortunately I can't find the concrete code which is executed by the system call.
Also in the NDA data sheet is only 'caution'-section with the hint about the system-call to invalidate the L2-Cache.
I'm a bit confused because some other OS's like FreeRTOS use my previous code to invalidate complete Cache on Cortex A8 but for DM3730 it doesn't work.
The DM3730 only work proper if system call is used to invalidate L2-Cache.
Hi Andreas,
thank you for your answer and I understood your situation.
May I ask you to check the contents of the Secure Configuration Register by doing as
MRC p15, 0, <Rd>, c1, c1, 0.
Isn't the NS bit (i.e. bit 0) being 1?
I guess that L2 cache invalidation can be performed under the Secure Privileged state.
The SMC execution makes the CPU into the Secure Privileged state even if NS bit is 1.
Thank you and best regards,
good point. During my previous research I also find the secure privileged mode.
But there the problem is that I can't enter this mode. The DM3730 is implemented as GP-device and so it do not support secure privileged mode.
The only way to perform some functionality in this state is to use the pre-implemented system calls with scm.
I can't find a way to implement own functionality in secure privileged mode.
If I read the Secure Configuration Register I get a undefined instruction exception.
Same problem if I try to switch to secure mode.
I'm not able to set an exception handler for SCM so I think TI handle this Exception in an own exception handler implemented in ROM-Code so I only can use their functions which are 'invalidate L2 Cache', 'write L2 Cache Auxiliary Control Register' and 'write Auxiliary Control Register'.
Is this assumption plausible?
Thanks an greets
Yes, it might be.
I think the DM3730 has the Secure Extension and we should take such complecated proceddures to invalidate L2 cache.
You could handle the Monitor Exception if you could change VBAR.
However, VBAR and SCR could be changed in probably the Secure Mode.
Also your codes probably run under the Non-secure Mode.
If the Boot ROM contents would be invisible, we have no means which should be further done.
my version of DM3730 is a GP-Device according to 'CONTROL_PRODUCTION_ID'. This means the security features including secure mode of the Cortex A8 are disabled. According to this I can't switch to secure mode. In conclusion to this I'm not able to change VBAR or SCR.
So I think there is no way to avoid the call to predefined SCM-functions of TI.