This entry is extracted from an entry originally posted on Carbon Design Systems’ blog by Andy Meier in February of 2013.  The blog has been edited to use the updated product names after their purchase by ARM.

 

In starting any new SoC design, it is important to ask a variety of questions, especially when selecting IP.  An architect may ask, for my application, how will the cache subsystem handle our existing code? What about code yet written?  How many instructions per cycle will the CPU be able to execute?  What about percentage of time spent on Cache and TLB misses?  This blog will dive a bit into how an architect at one of Carbon’s customers answered these questions and drive the IP selection process.

 

CPU: Cortex-A9 or Cortex-A7?

The first IP selection question our customer faced was to use a Cortex®-A9 or should they design with the new Cortex-A7. According to ARM, the Cortex-A7 will enable entry-level smart phones designs below a $100 while the performance of these smart phones will be equivalent to a $500 high-end smart phone of just a few years ago. Pretty impressive!

 

Among its many features, the Cortex-A7 has an integrated L1 and L2 cache, which allows lower transaction latencies and ultimately improved memory system performance. While the Cortex-A9 architecture is capable of supporting a 16, 32 or 64KB L1 cache with L2 cache supported with the optional PL310 L2 cache controller.

 

The designer’s intuition told them to choose the Cortex-A7 but they wanted to confirm their choice by benchmarking on a cycle accurate virtual prototype that was set up for several experiments that varied cache size, latency configuration, and interconnect possibilities. The benchmarks they chose to use in their experiments were Dhrystone and CoreMark. To jump-start their effort, they began with a Cycle Performance Analysis Kit (CPAKs) developed around the Cortex-A9 and Cortex-A7.   Each CPAK contains not only a simple platform but also the bare metal benchmarks and sample initialization code which allowed them to get up and running immediately.

 

They began their analysis by running Dhrystone on a Cortex-A9 1 CPU, 32K D-cache configuration with an external L2 Cache.  They examined the cache behavior by looking into cache events provided with each component.  Using SoC Designer profiling capability, they were quickly able to see how the benchmark was exercising the cache-subsystem.

Figure 1: Cache Activity from Cortex-A9 running Dhrystone

 

In addition to D-Cache characteristics, they gathered I-Cache information and TLB information provided by examining the PMU events from the Cortex-A9.  They used these to calculate the D-Cache miss rate, I-Cache Miss rate and TLB miss rate percentage for both Dhrystone and CoreMark.

 

Figure 2: Cortex-A9 CPU Profiling Events

 

Using ARM IP Exchange this customer quickly and easily specified the alternate configurations of the Cortex-A9 they were interested in.  Updating their platform to use these new models was as simple as selecting "Replace Component" in the SoC Designer Platform Menu.

 

Figure 3: ARM IP Exchange Selection page for Cortex A9

 

Replicating the platform configuration and the experiments set up for the Cortex-A9, they gathered the same profile information but this time for the Cortex-A7. Below you will see the Instruction and pipeline profiling information provided with the Cycle Model Cortex-A7 Model.

 

Figure 4: Cortex-A7 Profiling Events

 

Ultimately the experiments and analysis they performed, confirmed their initial thought of using the Cortex-A7 for this project.  The real value in this was that they were able to do this all in a 100% implementation accurate environment prior to finalizing their decision. These initial platforms were also leveraged later on in their design cycle for architectural performance optimization.