I’m excited to introduce the most complex Carbon Performance Analysis Kit (CPAK) created by The specified item was not found.; an 8-core ARM Cortex-A53 system running 64-bit Linux with full Swap & Play support. This is also the first dual-cluster Linux CPAK available on Carbon System Exchange. It’s an important milestone for Carbon and for SoC Designer users because it enables system performance analysis for 64-bit multi-core Linux applications.
Here are the highlights of the system:
Here is a diagram of the system.
The design also supports fully automatic mapping to ARM Fast Models.
I would like to introduce some of the new functionality in this CPAK.
The Cortex-A53 model supports the CLUSTERIDAFF inputs to set the Cluster ID. This value shows up for software in the MPIDR register. Values of 0 and 1 are used for each cluster, and each cluster has four cores. This means that CPU 3 in Cluster 1 has an MPIDR value of 0x80000103 as shown in the screenshot below.
Another requirement for a multi-cluster system is the use of a Global System Counter. A new model is now available in SoC Designer which is connected to the CNTVALUEB input of each A53. This ensures that the Generic Timer in each processor has the same counter values for software, even when the frequency of the processors may be different. This model also enables Swap & Play systems to work correctly by saving the counter value from the Fast Model simulation and restoring it in the Cycle Accurate simulation.
To create a multi-cluster system the GIC-400 is used as the interrupt controller, and the A53 Generic Timers are used as the system timers. This requires the connection of the Generic Timer signals from the A53 to the GIC-400. All of these signals start with nCNT and are wired to the GIC. When a Generic Timer generates an interrupt it leaves the CPU by way of the appropriate nCNT signal, goes to the GIC, and then back to the CPU using the appropriate nIRQ signal.
As I wrote in my ARM Techcon Blog, 64-bit Linux uses nCNTPNSIRQ, but all signals are connected for completeness.
Additional signals which fall into the category of power management and connect between the two clusters are EVENTI and EVENTO. These signals are used for event communication using the WFE (wait for event) and SEV (send event) instructions. For a single cluster system all of the communication happens inside the processor, but for the multi-cluster system these signals must be connected.
WFE and SEV communication is used during the Linux boot. All 7 of the secondary cores execute a WFE and wait until the primary core wakes them up using the SEV instruction at the appropriate time. If the EVENTI and EVENTO signals are not connected the secondary cores will not wake up and run.
The good news is that all of the software used in the 8-core CPAK is easily downloadable in source code format. A small boot wrapper is used to take care of starting the cores and doing a minimal amount of hardware configuration that Linux assumes to be already done. Sometimes there is additional hardware programming that is needed for proper cycle accurate operation that is not needed in a Fast Model system. These are similar to issues I covered in another article titled Sometimes Hardware Details Matter in ARM Embedded Systems Programming.
Although not specific to multi-cluster, the A53 contains a bit in the CPUECTLR register named SMPEN which must be set to 1 to enable hardware management of data coherency with the other cores in the cluster. Initially, this was not set in the boot wrapper from kernel.org and the Linux kernel assumes it is already done so it was added to the boot wrapper during development.
Another hardware programming task which is assumed by the Linux kernel is the enabling of snoop requests and responses between the clusters. The Snoop Control Register for each CCI-400 slave ports is set to 0xc0000003 to enable coherency. This was also added to the boot wrapper during development of the CPAK.
The gaps between the boot wrapper functionality and Linux assumptions are somewhat expected since the boot wrapper was developed for ARM Fast Models and these details are not needed to run Linux on Fast Models, but nevertheless they make it challenging to create a functioning cycle accurate system. These changes are provided as a patch file in the CPAK so they can be easily applied to the original source code.
The CPAK comes with an application note which covers the construction of the Linux image.
The following items are configured to match the minimal hardware system design, and can be extended as the hardware design is modified.
A single executable file (. axf file) containing all of the above items is compiled. This file contains all of the artifacts and is a single image that is loaded and executed in SoC Designer.
One of the amazing things is there are no kernel source code changes required. It demonstrates how far Linux has come in the ARM world and the flexibility it now has in supporting a wide variety of hardware configurations.
An octa-core A53 Linux CPAK is now available which supports Swap & Play. The ability to boot the Linux kernel using Fast Models and migrate the simulation to cycle accurate execution enables system performance analysis for 64-bit multi-core systems running Linux applications.
Also, make sure to check out the other new CPAKs for 32-bit and 64-bit Linux for Cortex-A53 now available on Carbon System Exchange.
The “Brought up 8 CPUs” message below tells it all. A number of 64-bit Linux applications are provided in the file system, but users can easily add their favorite programs and run them by following the instructions in the app note.
Unfortunately, I don't have a real hardware system with this 8-core configuration. I guess it's one of the drawbacks of working at a software company. Thanks for reading.
Great ! Is that this try works only on a virtual machive ?Could it be tested on a real hardware system at this moment ?