In this article I will explain how ARM® uses FPGA boards during the development of new cores to validate the design and how this technique is particularly useful to reduce project time and costs, giving an effective competitive advantage in the market.
The complexity of the design of a new SoC has consistently increased during the latest 10 years, due to the “smartphone/tablet revolution” which pushed the semiconductor industry to release constantly faster and less power greedy chips. The time-to-market requirements have shrunk as well, giving even more challenges to engineers working in this sector: develop a complicated design and verify it thoroughly in a very short time, mainly through pre-silicon verification since the increased cost of silicon masks has made Test Chips less attractive.
With the increasing complexity of cores, the validation phase now has even more importance during the development.The risk of shipping buggy SoCs must be minimized through careful analysis and the use of different verification and validation processes, starting from the very beginning of the project and continuing through each design phase until the end of the life cycle of the chip.
The design of the SoC is usually just a small part of the job compared to the entire effort necessary to develop a new product - the majority of the costs are in the verification of the product and developing the software to run on it.
The following chart (IBS 2013 data) shows an interesting analysis of the costs involved:
The chart shows that the growth of the costs associated with validation and software have been growing exponentially when compared with the process technology scale; this is the reason why they both see a dramatic rise in importance, not only at the end of the design phase but from the beginning of the project, where hardware design, verification and software are performed simultaneously.
There are mainly three methods to test a design before the tape-out, each of them has different advantages and disadvantages thus all three are used during the development of a core.
Simulation is used from the very beginning of the design because it gives a complete view of the signals in the RTL and it is often used to verify functionality at the module level. When the structure of the chip is close to being completed, the limits of this method start to rise quickly: the huge computational power required for the simulation limits the speed of the execution to 100Hz for the latest cores, requiring too much time to execute a boot of an operating system for example.
The performance problem could be partially solved by using an emulator which, with specialized hardware, can reach a speed of 1MHz, still keeping the same complete view of the internal status of the system. The main disadvantages of emulators are the cost of ownership, which is an order of magnitude bigger than the other methods, and again the time required to boot an actual operating system is not suitable for software developers who need to reboot few times during the day when developing kernel and device drivers.
Last but not least is the use of FPGAs: this method has been often relegated to being used during the later stages of validation, or to easily replicate a design for software developers, since it does not enable a complete (and not even a sufficient) view of the status of the system which helps the hardware developers to find the source of an error. Additionally the burden of having to setup a FPGA with the right hardware and all the components needed to have a functional FPGA prototyping board make this method less attractive for designers.
Even though the cost of a simulator is the cheapest and the most scalable option for testing a design, each solution must be taken in consideration together with the number of gates and speed achievable. In the following graph it is possible to see how the different modus operandi compares using this new metric relative to a final silicon chip:
As can be easily deduced from the graph, the simulator has the least attractive price/speed ratio whilst the FPGA has the most attractive one. This is the main reason that convinced ARM to start to think about to using FPGAs as integral part of its verification process and developing a new FPGA farm for this purpose.
Even though an additional step is required to port the design to an FPGA, the cost and speed benefits are much bigger and become crucial in the validation phase. Software developers can connect to an FPGA board directly through a JTAG connection and debug their code running on a development core. This brings benefits to both the hardware and software engineer: the SoC can be verified against real code and software engineers can test their code at early stages of the SoC development.
In the following graph is possible to see the general structure of the farm developed and used within ARM:
The structure of the farm has been designed for scalability. Each “module” of the farm is composed by a Linux server, an mbed and four pairs of FPGA boards/DSTREAM debugger. The server is the main access for the users of the farm and, through terminal, gives access to all of the operations that can be executed on the FPGA such as mapping the design on the FPGA to the read/write from the serial port.
Engineers who designed the farm did not have to look very far to find a suitable FPGA board as ARM already had what they were looking for: ARM’s Versatile Express series boards fit perfectly for the scope of the farm, giving all the necessary flexibility and scalability.
ARM Versatile™ Express boards minimize the time required to setup a prototype and allow engineers to concentrate on IP or software instead of debugging the development system.
The architecture of Versatile Express board has been designed for the development of future ARM cores and, with all necessary peripherals, give engineers a very flexible instrument to debug and test IPs before sending it to customers. In order to overcome the small visibility of the internal signals inside the cores, each Versatile Express has been connected to a Synopsis Protolink which provides simulator-like visibility with small effort. Protolink can be used in different ways:
To add even more debug capabilities, a DSTREAM debug and trace device is connected through a JTAG port to each Versatile Express. DSTREAM is not just a simple JTAG debug interface but a fully customizable device to access to all the features of a target’s ARM CoreSight™. If paired with the new CoreSight System Trace Macrocell (STM), DSTREAM become a powerful debugging tool which, not only gives the execution trace of the cores, but it will record selected software and hardware events. The STM supports time stamping and can be used by any device on the system main bus, providing time correlated information on events generated by CPU, GPU or DMA controllers.
A remarkable advantage of using DSTREAM over a “normal” debug device is the ability for it to be configured for new devices that are still in development. A good example of a successful use of DSTREAM in a real use case scenario was the development of ARM Cortex®-A53 and Cortex-A57 processors: DSTREAM already supports ARMv8 code trace and, using CoreSight Access Tool, it was possible to have debugging capabilities on the cores before a debug connection to the targets was available. Obviously it is possible to create configurations also for third party IPs to enable debug during development of complex targets running on the Versatile Express.
Even though DSTREAM provides a faster USB2.0 connection, Ethernet has been chosen for the FPGA farm to enable remote access simply by connecting it to through the existing network infrastructure.
mBed is a cheap and effective way to develop a prototype but there is nothing to stop you to use it for fully functional solutions! That’s exactly what happened in this case: mbed has a full set of peripherals such as Ethernet, USB, I/O and, most importantly, it is very easy to program. ARM engineers used a simple mBed board to develop an easy but functional system to manage the power of DSTREAMs, Versatile Express boards and Synopsis Protolink adapters. The system, connected through Ethernet to the server, allows remote user to reset and power cycle each single device they are working on without moving from their desk.
Software developers can use DS-5 Development Studio for the whole life cycle of SoC development, from the design phase, where no silicon is available to a final product, right through to the final product. This aspect is particularly important since the costs and time to get familiar with different tools with different characteristics can jeopardize the advantages of having an early access to the core in development.
The usage of the FPGA farm is not limited to CPU design within ARM: The Media Processing Division uses it as a part of its development process to test their GPUs.
The scalability of the solution has been proven by the 86 FPGA boards currently installed in two farms in Cambridge and more are being installed. Access to these boards is guaranteed from all ARM sites and the usage of FPGAs is now essential to the development of all new cores in ARM, both from a hardware and software point of view. It’s also worth mentioning that the Processor Division is not the only user of the farm: the Media Processing Division uses it as a part of its development process to test their GPUs, sharing all of the existing infrastructure, as proof of the great flexibility achieved thanks to the ARM tools.
Conclusion
In this article we analysed the three main different methods currently available for the verification and validation of a new chip design. We introduced the £/MHz.MG metric to compare effectively the different methods and, in particular, we described how FPGA can be effectively used for this scope. We described the current FPGA farm used within ARM to test and verify new designs and how ARM products such as Versatile Express and DSTREAM simplify the whole process and enable both hardware and software developers to collaborate on the same project in an effective way.
The scalability and the effectiveness of the solution has been proven during the last two years where ARM engineers in different sites around the world (Cambridge, Austin, Sophia-Antipolis) developed the first two ARMv8 cores, Cortex-A53 and Cortex-A57, while software developers were developing the support software in parallel.
Adding the Xilinx folks to the discussion
steveleibson I thought you might also be interested in the discussion