This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Cortex-A9/A15 L1 d-cache architecture

Note: This was originally posted on 21st March 2012 at http://forums.arm.com

Dear friends,

I'm a PhD candidate at the Complutense University of Madrid. I'm doing reasearch on memory allocation over the memory hierarchy, and I've built a trace-based simulator for memory hierarchies (it's slightly different than existing ones such as Dinero, so I had to build it anew).

I'm using this simulator to compare the performance of different allocation policies over different memory hierarchies, including comparissons between hardware-managed caches and software-managed memories. For the cache-based systems, I'm using as a basis the Cortex-A9 and the Cortex-A15 cache configurations. However, I've a doubt about them and I would like to get as much information as possible before proceeding. Please, notice that I'm not trying to compare the ARM solutions to anything else, but rather software methods for taking advantage of the available memory hierarchies.

My problem is that I know that the L1 data cache is configured as a 32 KB block, 2-way associative, with 64-byte lines, but I can't find any reference to the number of banks into which it's organized. My question is: "Is the cache organized into 8 banks?" That would make sense as then, a memory access from the processor would read just the 64-bits that contain the word. But it's also possible that the cache in configured into 16 banks, so the processor reads 32 bits instead. Or, it could even be that the cache is divided into less banks and the processor uses some method for internal storage of data just read from the cache...

Also, I'm using the energy consumption values calculated by Cacti 5.3 . However, I would really appreciate if anyone could tell me if it's possible to get the actual numbers for any ARM parts (I mean, given a manufacturer and a feature size). This way, I could make more precise results.

Thank you in advance for your help!

Miguel
Parents
  • Note: This was originally posted on 21st March 2012 at http://forums.arm.com

    Hi Miguel,

    This isn't a direct answer to your question but I have a correction: on Cortex-A9 the L1 caches are 4-way set associative and cache lines are only 32 bytes large. Getting that correct will probably have a larger impact on your modeling. Your specifications are correct for Cortex-A15.

    As for bank organization, I don't really know for sure, but I suspect that Cortex-A9 isn't banked. From what I understand banking is useful to allow multiple accesses (to separate banks) per cycle, with the number of banks decreasing the number of collisions and usually correlating with the read size. On Cortex-A9 the dcache interface is 64-bits wide, so if it were banked I'd expect there to be 4 banks. This line in the TRM seems to suggest the entire cache line is accessed, with a buffer to prevent accessing the same cache line consecutively:

    "To reduce power consumption, the number of full cache reads is reduced by taking advantage of the sequential nature of many cache operations. If a cache read is sequential to the previous cache read, and the read is within the same cache line, only the data RAM set that was previously read is accessed."

    Cortex-A15 does allow a load and a store simultaneously in the same cycle. No information is given on banking for L1 dcache, but it's noteworthy that banking information IS given for its L2 cache, which specifies 4 banks for tags and 4 banks for data. This is similar to the banking described for the L2 cache on Cortex-A8. The mention of banking for L2 and not L1 cache seems conspicuous if there's banking on both. It's possible something else is used for L1 parallelism. Maybe tags are duplicated instead of banked. The cache RAM itself could be read + write ported.

    Unfortunately, I doubt you'll get an official explanation.
Reply
  • Note: This was originally posted on 21st March 2012 at http://forums.arm.com

    Hi Miguel,

    This isn't a direct answer to your question but I have a correction: on Cortex-A9 the L1 caches are 4-way set associative and cache lines are only 32 bytes large. Getting that correct will probably have a larger impact on your modeling. Your specifications are correct for Cortex-A15.

    As for bank organization, I don't really know for sure, but I suspect that Cortex-A9 isn't banked. From what I understand banking is useful to allow multiple accesses (to separate banks) per cycle, with the number of banks decreasing the number of collisions and usually correlating with the read size. On Cortex-A9 the dcache interface is 64-bits wide, so if it were banked I'd expect there to be 4 banks. This line in the TRM seems to suggest the entire cache line is accessed, with a buffer to prevent accessing the same cache line consecutively:

    "To reduce power consumption, the number of full cache reads is reduced by taking advantage of the sequential nature of many cache operations. If a cache read is sequential to the previous cache read, and the read is within the same cache line, only the data RAM set that was previously read is accessed."

    Cortex-A15 does allow a load and a store simultaneously in the same cycle. No information is given on banking for L1 dcache, but it's noteworthy that banking information IS given for its L2 cache, which specifies 4 banks for tags and 4 banks for data. This is similar to the banking described for the L2 cache on Cortex-A8. The mention of banking for L2 and not L1 cache seems conspicuous if there's banking on both. It's possible something else is used for L1 parallelism. Maybe tags are duplicated instead of banked. The cache RAM itself could be read + write ported.

    Unfortunately, I doubt you'll get an official explanation.
Children
No data