This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

ARM1176JZ-S, cache confg: effective cache size calculation

Note: This was originally posted on 22nd February 2009 at http://forums.arm.com

Hello,

1) I am using ARM1176JZ-S core with WinCE Platform. The cache memory is configured as follows

    DCache: 128 sets, 4 ways, 32 line size, 16384 size
    ICache: 128 sets, 4 ways, 32 line size, 16384 size

    Now I want to know the effective data cache size, I mean the total data from the main memory 
    could be cached and accessed without cache trashing within a function.

2) Is the cache set size(128 sets) and cache block/segment(of other processors) size are same?

Kindly reply this mail, thanks in advance

Regards,
Deven
Parents
  • Note: This was originally posted on 9th March 2009 at http://forums.arm.com

    Assuming that your QUERY_START and QUERY_END macros are calling a system function to get the time stamp, I would think that you are spending a significant time in the kernel to actually process the time request.

    Your test loop is quite short (64K ops, from cache, may only be 100K-200K cycles). There is an off chance that the system call is much slower than this (in most OSes the system calls are expensive) and because your loop has just saturated the D cache it will run even slower than it would normally.

    Functions like printf are also quite data heavy, so you may find you are corrupting a significant chunk of the cache that you think you have preloaded, but in fact evicting it by printf'ing before calling the test loop.

    (If you are using the CP15 performance counters, you are in with half a chance. You can also measure the cache misses for both I and D to sanity check your results).

    Constructing benchmarks to measure cache effects can be quite difficult, especially on top of an operating system...
Reply
  • Note: This was originally posted on 9th March 2009 at http://forums.arm.com

    Assuming that your QUERY_START and QUERY_END macros are calling a system function to get the time stamp, I would think that you are spending a significant time in the kernel to actually process the time request.

    Your test loop is quite short (64K ops, from cache, may only be 100K-200K cycles). There is an off chance that the system call is much slower than this (in most OSes the system calls are expensive) and because your loop has just saturated the D cache it will run even slower than it would normally.

    Functions like printf are also quite data heavy, so you may find you are corrupting a significant chunk of the cache that you think you have preloaded, but in fact evicting it by printf'ing before calling the test loop.

    (If you are using the CP15 performance counters, you are in with half a chance. You can also measure the cache misses for both I and D to sanity check your results).

    Constructing benchmarks to measure cache effects can be quite difficult, especially on top of an operating system...
Children
No data