Time cost of some huge functions (execute 1000 times) like match lib function( take acos for example ) changed hugely.
I run this test on a RTOS, though it is SMP, I only enable one core, and before invoke the fucntion, I disable interrupts and lock the task switch. So there should be no interrupts during the execution of the function.
I add some monitor code at this test, it seems when the time cost become longer, the count of instruction cache miss become bigger, and "Stall because instruction buffer cannot deliver an instruction" become bigger. So it seems this is related with instruction cache.
Then I add instruction cache invalidation operation before the function, the time cost become steady.
I see that ARM11 MPCore is using a round-robin cache replacement policy. So I think there should be no time difference whether adding the cache invalidation operation.
Who can help me with this issue? Thanks!
Waiting!......
Thomas, in the interests of avoiding frustration be aware that Community is run as a true forum not a support channel. So normal rules of forum engagement apply. i.e.The comments you get (or lack of) will vary depending on the topic & the question / information presented. If you have an entitlement to support you can use the formal support channel. Email: support@arm.com
It's perhaps worth a general comment that cached A class cores are non deterministic by nature. What's in the cache at a given point isn't guaranteed, because of things like speculative accesses and prefetching. (i.e. things outside your control in software). It's unsurprising that invalidating the cache gives more consistent (but presumably worse) performance. Generally cached cores aim to give *much* better performance in the general case. However, it's possible to have a pathalogical bit of code that demonstrates occasional poor performance. Possibly it's something specific about the code or code structure you can change, possibly not.
First, thank you for your support!
1. I disabled prediction fetch.
2.The code invoked after disable interrupts and lock task is only a math function "acos".
3. Invalidate intr cache will steady and faster, it seems that if I didn't invalidate instr cache, some cache lines can not be used. (just guess)
I find it surprising that invalidating the I cache improves the performance of this routine. This makes me wonder - how many iterations are occurring within the routine you are measuring. The cache relies on re-use, so linear code paths (or sometimes certain code structures, like pointer chases) won't cache well. If are seeing lots of misses/evictions... You can try playing with code/data placement in memory to try to reduce the contention. Or more dramatically try to re-arrange the code and make it more "cache friendly"
More generally...- Check the page tables in place for the region your code runs over and ensure the attributes are all sensible- Be careful when using debuggers - many will perform cache maintenance operations (invalidating caches) & skewing your results. Also be wary of semihosting operations supporting printf type I/O via a debugger.- Note that I side stalls can be rooted in D side contention (for example a LDR/STR instruction can stall waiting for data). So don't dismiss D side effects.
Good luck with your project.MarkN.
Sorry to reply so so late.
Thank you, but I am assigned to do another job, this one is neglected.