We are running a survey to help us improve the experience for all of our members. If you see the survey appear, please take the time to tell us about your experience if you can.
Deven,From the data you have provided, the data-cache is 128 lines of 4 way set associative with each way containing 32 bytes per line; multiplying all these numbers together produces the 16384 byte total size. A 32kB cache on this implementation would have twice the number of lines (256) and a 64kB variant would have twice the number of lines again (512).The line and byte offset within the line is a fixed mapping for any particular byte in memory, however, the byte may live in any of the 4 ways (hence the cache is 4-way set associative). The choice of way is made when the data is first fetched into the cache based on a victim way pointer, which in turn is based on some replacement algorithm (psuedo random, round-robin etc.).Given this information, it is theoretically possible for this data-cache to hold 16kB of sequential data starting from any cache line size aligned memory address, though achieving this will be dependent on interactions between code and the cache replacement algorithm.The 4kB number you appear to be refering to is the size of a single way of the cache (128 lines * 32 bytes per line). This, assuming you don't have any literal loads in your code, is the size of a contiguous, cache line size aligned, block of data you could repeatedly read (in a loop) where it should be impossible for any evictions to occur after the first time through the loop (each group of 32-bytes will be in a separate line, though not necessarily in the same way).hths.
> I mean the total data from the main memory 16KB for data, and 16KB for instructions are the critical numbers you will want. The rest is just noise unless you design a pathological algorithm which really abuses the cache.Some systems also include a L2 cache which can cache more data between the L1 caches and the main memory.> Is the cache set size(128 sets) and cache block/segment(of other processors) size are same?It varies - basically the scheme you outline means that for any 1 address there are 4 possible places (ways) where the data may reside. 128 sets (cache lines) * 4 ways * 32-bytes per cache line = 16 KB. Four 4-way caches are pretty common as they give a good trade-off between speed and cache utilization for typical code. For most caches on ARM systems the number of ways is fixed, but number of sets depends on the size of the cache. In this case 64 sets = 8KB cache, 128 sets = 16KB cache, etc.
Hello,I had profiled the below code. But I could not see cache advantage of repeated access.This code is placed in a WinCE appication thread and profiled.[snipped]Could you explain where is the error.Thanks,Deven
[skipped]Does the first memory access immediately after the cache flush shall consume less cycles than memory access at cache with some entry. (Assume both are cache hit condition)I am doing a code optimization. In any-case cache flush before the algorithm execution benefit the algorithm performance compare to with-out cache flush.In other way, the pseudo-random/round-robin cache line selection and fill have any effect of cache flush before the operation.