I started my design career far too long ago doing system verification on a multi-processor server design. Basically, I was charged with assembling a model of the system and then writing some tests to exercise it. This was long before the days of virtual prototypes so I assembled the system in RTL simulation using an LMSI hardware modeler to represent the existing processor and cache components in the system. When it came time to get software up and running on the system, I started off by writing a few directed tests to run on the processors in the design. These tests were designed to stress the system but configuring all of the components started to become burdensome so I went to the software team and started borrowing the code that they were writing for the eventual real silicon and got it up and running on my system model. After spending far too much time figuring out the problems in my verification system (it was my first job after all) I started finding real system problems. Software driven verification was finding problems that the hardware verification team had missed. Since this was the first time that software was being run on the real hardware, albeit as a simulation model, we found numerous problems in both the hardware and the software.
A few years later, I migrated from doing design work to working as an applications engineer at Quickturn Systems. I saw firsthand the huge amounts of money and design resources which companies would allocate to assemble a model of the system before silicon. They were taking a lot of the same approach that I had done in my first job running real software on real hardware. Instead of assembling the system in software talking to a hardware modeler they were using a cobbled together system with a washing machine sized emulator hooking into a specially designed hardware board with what seemed like miles of spaghetti-like cables in between. The hardware teams would typically use the systems during the day time hours to do their system verification work with the software teams relegated to nighttime hours for their time on the box. (After all, emulators were an expensive resource. It made sense to schedule them for round the clock usage and I spent more than one 2am session in the lab to help keep the boxes running.) The value was high though. The interaction of real hardware and real software before silicon accelerated design schedules and found corner case functional and performance issues that would have otherwise made their way into silicon. It was difficult, it was expensive but for many design teams, it was worth it.
Fast forward a few years to the present day and it looks like the path to do this system level validation is evolving once again. We’ve seen an increasing number of design teams adopt a validation strategy that involves using system level software to drive the validation of the design long before silicon. While this has historically been done using the actual system software (getting to that boot prompt well before
tapeout is still a typical milestone) many design teams are now crafting software specifically for the purpose of validating their system. This software can either be targeted software they’ve written themselves or third party verification software from leading companies like Breker Systems. We recently published an article together with Breker in EETimes which talks about doing precisely this to address the huge problem of coherency validation in the newest generation of ARM V8-based SoC designs. It goes into a good amount of depth on cache coherency so I'd certainly recommend it if your next design is using hardware coherency. The CPAK discussed in the article uses two clusters of four core ARM Cortex-A53 CPUs but it can be easily modified to better represent your actual design.
While emulation and FPGAs remain a popular, albeit expensive, way to execute this validation software, an increasing number of teams have been performing this valuable step on virtual prototypes. Using virtual prototypes together with system software isn’t new of course, that’s been done for a long time. The latest wrinkle though is the ability to do this software development on a virtual prototype which is actually an accurate representation of the system. Traditionally, virtual prototypes have been functional models only and have abstracted away the implementation details of the system in order to achieve performance. Today, it is possible to use virtual prototypes that have both the speed of these high level virtual prototypes (10s to 100s of MIPS) but also still have all of the accuracy of the RTL implementation. What’s more, many of these systems are already built and have system software already running on them.
Of course, when I say RTL-accurate, alarm bells start going off. Does this mean I have to debug my software using waveforms? Am I going to have to learn how to run a hardware simulator? And of course: Don’t accurate models run too slowly to execute real software? Thankfully, the answers to all of these questions is no. Let’s see why.
Most times, the fastest way to get a working model of your system is to take a working model of a system similar to yours and port it to more closely represent your design. This is the reason that we have so many CPAKs on our System Exchange web portal. Using the search parameters, you can easily narrow down the lengthy list of pre-built systems (well over 100 when this blog is written) to one that most closely matches your design. You can even choose the software you want to run, from the simplest bare-metal benchmark to a full Linux boot and OS-level benchmarks.
Once downloaded, the CPAK can be easily customized to mimic your actual design. You can do this by using models from IP Exchange, RTL models you’ve compiled using Carbon Model Studio or SystemC models. They can either add to the supplied system or replace existing components.
Your next step depends upon your design needs. If you want to develop high level software without need for system accuracy you can certainly do so. Simply use the ARM Fast Model representation of the system. This enables you to run at Fast Model speeds which are typically in the 10s to 100s of MIPS. You can even execute in a hybrid configuration if desired, mixing Fast Model components together with Carbonized RTL models. This is a common use case for components such as GPUs which don’t have Fast Model representations. The system runs at Fast Model speeds except when accessing the GPU or rendering a frame. This approach enables a fast OS boot before beginning video operations which then run accurately since the GPU is RTL-accurate. Bear in mind of course that any hybrid combination of Fast Models and accurate models will not generate the same types of accesses to the GPU as would be seen in a real system. This is true if this hybrid combination takes place entirely in the virtual world or by tying a virtual prototype to an emulator. Since Fast Model representations are functional only and don’t attempt to correctly model cycle accuracy this type of an approach is only well-suited for software development and not for system architecture or validation. For those tasks, only an accurate system model will do.
This brings us back to where we started: using software on an accurate representation of the system to drive validation. Before we talk more about that though, I should answer the questions I mentioned above about debugging and execution speed. The debugging question is an easy one. Although the waveforms are there is you really want to be masochistic, all Carbon models of ARM IP available on our IP Exchange web portal contain an integration with ARM’s DS-5 debugger to enable truly interactive debugging. This isn’t a post-process “gee, I wish I could change that value but it’s too late now” integration. It’s one that enables the designer to view and modify the contents of any register or memory location while the program is running. The entire system also runs in a complete virtual prototype environment so no hardware simulator connection is necessary. No complicated command lines needed or extra licenses to check out at runtime.
This of course brings us back to the speed question. After all, functional models run faster than accurate models precisely because they’ve eliminated accuracy. How can we get the speeds needed to boot an OS or develop application level software and still expect to have accuracy? This is where Carbon’s Swap & Play technology comes to the rescue. Our virtual prototypes and CPAKs have the ability to start running using a Fast Model representation of the system and then swap over to 100% accurate models at any software breakpoint. This approach lets you boot your OS in under a minute, far faster than with any emulator or FPGA prototype, and then continue running with the accurate representation. This enables the tasks that require accuracy such as performance optimization or system validation. You can even create multiple breakpoints to start running accurately at different points in the system execution.
Software can be a very effective way to verify the behavior of an SoC long before tapeout. Whether you're using actual system software to do this or leveraging dedicated system verification software from companies like Breker, you have the ability to see true system behavior and fix problems earlier in the design cycle. Virtual prototypes simplify this task with their ability to offer true interactive debugging of both hardware and software with complete visibility. Carbon's CPAKs offer a great way to further accelerate the development of these systems and let that verification task start earlier in the cycle.