Time cost of some huge functions (execute 1000 times) like match lib function( take acos for example ) changed hugely.
I run this test on a RTOS, though it is SMP, I only enable one core, and before invoke the fucntion, I disable interrupts and lock the task switch. So there should be no interrupts during the execution of the function.
I add some monitor code at this test, it seems when the time cost become longer, the count of instruction cache miss become bigger, and "Stall because instruction buffer cannot deliver an instruction" become bigger. So it seems this is related with instruction cache.
Then I add instruction cache invalidation operation before the function, the time cost become steady.
I see that ARM11 MPCore is using a round-robin cache replacement policy. So I think there should be no time difference whether adding the cache invalidation operation.
Who can help me with this issue? Thanks!
First, thank you for your support!
1. I disabled prediction fetch.
2.The code invoked after disable interrupts and lock task is only a math function "acos".
3. Invalidate intr cache will steady and faster, it seems that if I didn't invalidate instr cache, some cache lines can not be used. (just guess)
I find it surprising that invalidating the I cache improves the performance of this routine. This makes me wonder - how many iterations are occurring within the routine you are measuring. The cache relies on re-use, so linear code paths (or sometimes certain code structures, like pointer chases) won't cache well. If are seeing lots of misses/evictions... You can try playing with code/data placement in memory to try to reduce the contention. Or more dramatically try to re-arrange the code and make it more "cache friendly"
More generally...- Check the page tables in place for the region your code runs over and ensure the attributes are all sensible- Be careful when using debuggers - many will perform cache maintenance operations (invalidating caches) & skewing your results. Also be wary of semihosting operations supporting printf type I/O via a debugger.- Note that I side stalls can be rooted in D side contention (for example a LDR/STR instruction can stall waiting for data). So don't dismiss D side effects.
Good luck with your project.MarkN.
Sorry to reply so so late.
Thank you, but I am assigned to do another job, this one is neglected.