I already shared last month some details of work we have been doing with Arm on an HPC testchip, the good news continues with our announcement of extended support for the AMBA 5 protocol family with support for CHI.b in our Cycle-accurate performance analysis product Interconnect Workbench. I have blogged multiple times on cycle-accurate performance analysis and links are provided below.
[CTAToken URL ="/soc/b/blog/posts/performance-analysis-and-verification-of-soc-interconnects" target="_blank" text="Performance Analysis and Verification of SoC Interconnects" class ="green"]
[CTAToken URL ="/soc/b/blog/posts/how-to-measure-and-optimize-the-system-performance-of-a-smartphone-rtl-design" target="_blank" text="How to Measure and Optimize the System Performance of a Smartphone Design - Part 1" class ="green"]
[CTAToken URL ="/soc/b/blog/posts/how-to-measure-and-optimize-the-system-performance-of-a-smartphone-rtl-design---part-2" target="_blank" text="How to Measure and Optimize the System Performance of a Smartphone Design - Part 2" class ="green"]
[CTAToken URL ="/soc/b/blog/posts/how-to-measure-and-optimize-the-system-performance-of-a-smartphone-rtl-design---part-3" target="_blank" text="How to Measure and Optimize the System Performance of a Smartphone Design - Part 3" class ="green"]
[CTAToken URL ="/soc/b/blog/posts/how-to-measure-and-optimize-the-system-performance-of-a-smartphone-rtl-design---part-4" target="_blank" text="How to Measure and Optimize the System Performance of a Smartphone Design - Part 4" class ="green"]
As you can imagine performance is a key aspect in building an Arm Server SoC and ensuring all the Arm IP is assembled, configured and integrated correctly is a key proof point ahead of tape-out. Analyzing ID reuse schemes, configuring read and write issuing limits and address hashing for multiple DDR controllers are just some of the challenging tasks that are greatly simplified through a sophisticated GUI.
Interconnect Workbench provides automatically generated AMBA testbenches for monitoring complex SoCs for running on Xcelium simulation or Palladium acceleration platforms. Support for Palladium especially enables SW workloads to be run and bandwidth and latency captured for detailed analysis. It is only with more realistic workloads that real address patterns, ID usage, and real IP with buffers create that the original architects assumptions on performance can be validated.
I will be attending Arm Techcon in 2 weeks time so please if you have an interest in seeing what Interconnect Workbench can do to help you optimize the performance of your Arm SoC or just chat about your specific SoC performance challenges it would be great to meet up, please connect with me on email nickh@cadence.com and we can schedule a meet.