Using Portable Stimulus in the Arm World: Creating bare-metal SW coherency scenarios

In my last blog (Navigating SoC Verification with Perspec Portable Stimulus) I introduced the Accellera Portable Stimulus Standard (PSS) and how Cadence Perspec System Verifier supports the creation of portable baremetal Arm SoC integration tests using the Perspec PSLib for multicore Armv8 and Armv8.2 architectures. In this blog we will dig a little deeper into what PSLib supports and how it can be used Out-of-the-box to create a rich variety of coherent and I/O coherent scenarios.

It is worth spending a few minutes just revisiting cache and it’s place across the hierarchy of Arm IP. With the advent of DynamIQ, Arm’s new cluster microarchitecture, there are a multitude of places where cache lives:- within each core, usually called L1 cache, this is typically the smallest and fastest cache in the system, shared between cores of like type, usually called L2, shared across the cluster, called L3 and shared across the clusters, which may be called Last Level Cache (LLC) or System Cache, typically the slowest but largest cache in the system.

Example of structure using Arm DynamIQ

There are any number of architectural options available when constructing such systems and therefore some or all these caches may be present in your target system. Interestingly with the announcement of the new CCIX protocol we will soon see Arm-based SoCs which also share cache from chip-to-chip as well. 

Given the number of options and the need to integrate these complex compute subsystems into bigger SoCs which may also utilize I/O Coherency to optimize the system performance for high speed I/O such as PCIExpress, it is essential that the caching is fully exercised before committing to Silicon as a bug in the integration of the SoC could prove disastrous.

To address this growing complex challenge Cadence developed a rich set of portable actions which comprise the Perspec PSLib, they are readily assembled into target scenarios with code then being generated at the push of a button. In fact for two common cache testing scenarios, the library provides a complete scenario ready-made.

False Sharing

I will now explain in a little more detail the “False Sharing” scenario, look for my next blog coming soon which will detail the “True Sharing” scenario. 

False Sharing is a situation where cache lines are being used by a number of cores, and hence the system considers them shared data, but in fact the cores are using exclusively different parts of the cache line and therefore do not actually share data with each other.

The figure below shows by colour which core is using which bytes of the 64 byte cache line. We can immediately see that within each cache line, regions of data are exclusively used by one core only (one colour). This is what we mean by False Sharing.

False Sharing example

Also notice the regions are not of regular size, but obviously a whole number of bytes. The permutations of False Sharing situations are enormous especially when considering the hierarchical cache architecture permutations. Creating baremetal SW scenarios to cover a good number of permutations using hand-written code would be a significant challenge.

The PSLib provides a ready-made scenario to create such scenarios with a number of degrees of freedom, the Perspec generator provides multiple tests generated from one single use-case greatly increasing test writer productivity. The beauty of the Portable Stimulus model is that these scenarios can be intermixed with your own scenarios creating stress tests that are uniquely targeting your SoC, for example maybe you want to mix cache stress with power management, this is readily achieved with Perspec

Very easily, complex multithreaded uses-cases can be created for any number of cores with randomly selected regions of shared memory, see the example below.

Perspec is able to generate a huge number of specific test cases, the diagram above is one specific solution, through powerful constraint solver technology and the PSS model which abstractly defines data dependency independent of action ordering. This brings huge productivity to the test writer as one test can create hundreds of possible solutions, the user can pick one and then run it on the SoC they are working on.

In the next blog I will dig a little deeper into how tests are created and how users can use coverage to decide which test or tests they want to run.