This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cache lock down in ARM Cortex-A7

Anshul over 11 years ago

I want to improve the performance of some game in our platform. I have heard of cache lock down feature to achieve performance improvement. Please suggest me how to use this feature in ARM Cortex-A7 for performance improvement.

Top replies

Peter Harris over 11 years ago +1 verified

I'd just like to add a comment on this ... I have heard of cache lock down feature to achieve performance improvement. ... as I've seen this comment before in other forums and would like to pin some...

0 wangyong over 11 years ago

The L1 and L2 Cache of Cortex-A7 don't support lockdown, you may consider other ways to improve performance. You can refer to the following links.

Re: L2 Cache difference between Cortex-A9 and Cortex-A7?

Does ARMv8 SOC support cache lockdown?
Cancel
Vote up 0 Vote down

Cancel
+1 Peter Harris over 11 years ago

I'd just like to add a comment on this ...

I have heard of cache lock down feature to achieve performance improvement.

... as I've seen this comment before in other forums and would like to pin some additional information somewhere where it is visible to the wider ARM community, as it is a horribly incorrect misconception which I would like to stamp on.
Cache lockdown is almost never a good thing for performance for general purpose application code. Caches are nearly always smaller than the total overall code/data set on a platform. If you lock down 25% of the cache to accelerate one application, you effectively make the cache 25% smaller for everything else. You may make one application faster (possibly, if it happens to have a small data set, although even this is questionable for most applications which are bigger than the L2 size), but at the expense of making everything else running on the platform slower. For this reason cache lockdown is nearly always "the wrong thing", even if it were available. For most software development there is unfortuantely no quick fix to making software run faster; you just need to profile and optimize your application hotspots to use better algorithms, cleaner code, less memory, etc.
The only real use case for cache lockdown is for critical sections of hard-realtime systems where guaranteed performance of small code sections (interrupt handlers and the like) is required, and the overall loss of cache (and drop in peformance for everything else) is viewed as an acceptable sacrifice to achieve that predictable response time. It is also worth noting that in many markets needing realtime response TCM is generally available as a synthesis option in the Cortex-R family, so even in those markets there are better alternatives to cache lockdown which provide better area efficiency.
In summary - cache lockdown generally makes your platform slower (due to smaller average cache size remaining after lockdown), but buys predictable performance time for critical realtime sections. It is not, and never has been, an optimization techique to make application code run faster.
HTH,
Pete
Cancel
Vote up +1 Vote down

Cancel
0 Anshul over 11 years ago in reply to Peter Harris

Hi Peter,
Thanks a lot for your detailed description.
Could you please explain following line little better :
It is also worth noting that in many markets needing realtime response, TCM is generally available as a synthesis option in the Cortex-R family, so even in those markets there are better alternatives to cache lockdown ...

Also could you please let me know the steps to perform cache lock down for any piece of code (I want to try it own my own).

Anshul
Cancel
Vote up 0 Vote down

Cancel
0 Peter Harris over 11 years ago in reply to Anshul

Hi Anshul,

Could you please explain following line little better :

It is also worth noting that in many markets needing realtime response, TCM is generally available as a synthesis option in the Cortex-R family, so even in those markets there are better alternatives to cache lockdown ...

A TCM (Tightly Coupled Memory) is a flat memory which exists at the same (or similar) level in the memory hierarchy as the L1 cache, often split in to pairs so you have an I-TCM and a D-TCM. The processor can use TCM directly without needing to go via the cache for the TCM mapped address ranges.
Also could you please let me know the steps to perform cache lock down for any piece of code (I want to try it own my own).
As Wangyong mentioned, it isn't supported at all on the Cortex-A7. It is "optional" in the ARM architecture and "implementation option" in the standalone L2 cache controllers. It is very rarely actually implemented, because it is so rarely useful, so there is a good chance that you can't actually do this on your device.
HTH,
Pete
Cancel
Vote up 0 Vote down

Cancel
0 Anshul over 11 years ago in reply to Peter Harris

Hi Peter,
Thanks a lot for clarification. I want to try it on some other hardware. If possible please guide me how to perform cache lock down on some piece of code. I want to check the impact of cache lock down. Thanks again.
Anshul
Cancel
Vote up 0 Vote down

Cancel
0 Peter Harris over 11 years ago in reply to Anshul

It depends a little on the hardware you have, so I would check the TRM.
For example – here are the instructions for the L220 cache controller:
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0329l/Beieiiab.html
Regards,
Pete
Cancel
Vote up 0 Vote down

Cancel
0 wangyong over 11 years ago in reply to Peter Harris

Hi peter,
Due to TrustZone security extension of Cortex-A processor, the cache lines tagged as secure can't be replaced by non-secure line fill and
accessed by non-secure read/write. So do we need to clear and invalidate cache at the end of secure world software operation to
avoid that normal world software only uses part of the cache?
Best Regards.
Cancel
Vote up 0 Vote down

Cancel
0 Peter Harris over 11 years ago in reply to wangyong

Hi Wangyong,

the cache lines tagged as secure can't be replaced by non-secure line fill and accessed by non-secure read/write.

Half right =)
TrustZone guarantees that the non-secure environment cannot read or write secure lines. However it does not guarantee non-eviction of secure lines by non-secure accesses. The line fill policy is fully dynamic - it's not a statically partitioned cache, and non-secure line fill can evict secure entries (and visa versa).
HTH,
Pete
Cancel
Vote up 0 Vote down

Cancel
0 Anshul over 11 years ago in reply to Peter Harris

Hi Peter/Wangyong,
I have one more question regarding cache lock down. Lock down mechanism can be used oonly for L1 (I-cache/D-cache)/ L2 cache or both L1 and L2. In Cortex A7 TRM I found following line in L1 Instruction cache controller section.
no lockdown support
So does it mean that only L1 doesn't support lock down? Can we use lock down for L2 cache? Please help me to clarfiy it. Thanks a lot..
Anshul
Cancel
Vote up 0 Vote down

Cancel
0 wangyong over 11 years ago in reply to Peter Harris

Hi Peter,
   Thanks a lot. I misunderstand about the cache eviction and I also find description in TrustZone whitepaper according to your explanation.
Caches
     It is a desirable feature of any high performance design to support data of both security states in the caches. This removes the need for a cache flush when switching between worlds, and enables high performance software to communicate over the world boundary. To enable this the L1, and where applicable level two and beyond, processor caches have been extended with an additional tag bit which records the security state of the transaction that accessed the memory.
     The content of the caches, with regard to the security state, is dynamic. Any non-locked down cache line can be evicted to make space for new data, regardless of its security state. It is possible for a Secure line load to evict a Non-secure line, and for a Non-secure line load to evict a Secure line.
Best Regards.
Cancel
Vote up 0 Vote down

Cancel
0 wangyong over 11 years ago in reply to Anshul

L1 (I-cache/D-cache)/ L2 cache of Cortex-A7 don't support lockdown. TRM just does't describe this in detail.
Cancel
Vote up 0 Vote down

Cancel
0 Anshul over 11 years ago in reply to wangyong

Hi Peter/Wangyong

Is it possible to check contents of cache at run time? i.e In case of any problem in system I want to check whether cache contents are in sync with Main memory or not? How can I store cache contents in RAM? If it possible then please let me the procedure. Thanks a lot!!
Cancel
Vote up 0 Vote down

Cancel
0 Peter Harris over 11 years ago in reply to Anshul

Not easily - you're trying to make visible something CPUs try very hard to make invisible.
If you hit what you think are coherency problems then you can try inserting cache cleans of the entire cache, and if the problems go away then that is likely your problem. Another approach I've used in the past, if you know the address range which is causing problems, is to use a mater outside of the CPU (such as a DMA engine) to create a copy of the main memory contents, which the CPU can then read back and compare against it's view of the original data.
Cancel
Vote up 0 Vote down

Cancel
0 Anshul over 11 years ago in reply to Peter Harris

Hi Peter,
Thanks a lot for your reply.
As you know it's very hard to reproduce cache coherency problems. I am working of Android and till date I suspect that 2 problems occur due to cache coherency but don't have any proof to confirm the same. So what I want is if exception occurs then I copy the data of cache into RAM. By doing this I can check the data of cache with RAM and confirm whether it is because of cache coherency or any other reason. Here I am not sure about address range also. So can you suggest how to analyze such problems if we can't dump cache contents in RAM.
Anshul
Cancel
Vote up 0 Vote down

Cancel
0 Bill Pringlemeir over 11 years ago in reply to Peter Harris

While I haven't examine it in depth, it seems that the TLB cache seems to persist Secure accesses for a long time. I copied the normal world boot code via secure world and do not flush the secure TLB. After one day of use, the initial secure copies are still present in the TLB cache. Either the normal world OS is avoiding section entries or it seems that secure TLB entries on the Cortex-A5 persist for some time. I also have a 'NULL' TLB entry from the secure OS, so it has been interesting to dump the TLBs. I definitely don't think the eviction of secure world entries is 'standard'. Most definitely, the normal world OS accesses the same boot sections and no normal world entries are allocated.
Also, the eviction of the secure world lines makes them susceptible to the same attacks discussed in this hyper-threading paper.
Cancel
Vote up 0 Vote down

Cancel