Co-authored with Giacomo Gabrielli and Jose Joao
The Arm Scalable Vector Extension (SVE) is a key technology for Arm processors to efficiently address the increasing computation requirements of future high-performance computing (HPC), data analytics and machine learning applications. Designing systems to address such requirements, however, is not straightforward, and particular attention goes to the design parameters of the vector units. In this regard, SVE gives a lot of flexibility to computer architects: for instance, it doesn’t even mandate a specific vector length! While this flexibility is great, it also reinforces the importance of having scalable and accurate modeling tools, so that computer architects can confidently anticipate the expected performance/power/area of their designs.
Software models, aka simulators, are a fundamental item in the toolbox of computer architects: they allow rapid prototyping of complex micro-architectures to gain relevant insights to drive the design phase. At the same time, software models are one of the main vehicles for academia to prototype new innovative techniques and ideas at the architectural and micro-architectural level. The vector unit design significantly impacts other parts of the processor design, especially the memory hierarchy, which must be able to feed data into the processor at the rate that the vector units can process it. Therefore, SVE simulation must properly model the whole system and the interactions among its components. gem5 is an open-source full-system micro-architectural simulator that is widely used in both academia and industry. Arm is a major contributor to gem5 and has developed and upstreamed many features and models over the past decade. SVE support in gem5 has been recently upstreamed on a development branch, and it is in the process of being merged into mainline.
In an effort to increase the visibility to SVE and to introduce the newly available support for SVE in gem5, Alejandro Rico, Giacomo Gabrielli and Jose Joao hosted the 'gem5 SVE tutorial' at the International Conference on Supercomputing in Beijing in June 2018. The tutorial covered the features of SVE, the trade-offs of designing a multi-core that uses vectors, and the publicly available tools to model the performance of such vector architectures, with an emphasis on gem5 with SVE support. In addition to gem5, the tutorial also covered other analysis tools for SVE, such as the Arm Instruction Emulator.
Hosting the tutorial at ICS in Beijing gave Arm the opportunity to further increase the visibility of SVE and gem5, and to reach out to several academic groups and business partners in China. The attendance was great, and the organizers were very pleased with the outcome of the tutorial – it was a great pleasure to see a high level of interaction throughout the session and to answer lots of interesting questions too.
There were around 30 attendees from academia and industry in China, many of them with good working knowledge of SVE. There were great discussions on the availability and features of the Arm tools for SVE, including Arm HPC Compiler, Arm Performance Libraries and Arm Instruction Emulator, in addition to the new SVE modeling capabilities in gem5. These tools are fully supported by Arm and are accessible through the Arm HPC Tools website.
It’s a really helpful SVE tutorial - we enjoyed a wide and deep discussion with the Arm Research specialists. As well as the SVE programming, the speakers also gave us the background and benefits of the Arm SVE architecture. I hope more and more of these kinds of seminars will be held in China! Zhou Yongbin, HiSilicon
The design of large SoCs with many cores remains a challenging task due to all the trade-offs that need to be balanced. Modeling remains fundamental for these tasks specially to get introspection on core and system microarchitecture interactions. It is also fundamental to have specific models of state-of-the-art technologies to faithfully estimate the impact of new ideas. The vector-length agnosticism feature of SVE provides great flexibility for partners to tailor their design for the target market in terms of performance, power and area. Including vectors in the design for efficiency also increases the design space to be explored and therefore needs capable modeling tools.
gem5 is therefore a valuable addition to the processor designer toolbox now with support for the Arm Scalable Vector Extension. It enables simulation of Armv8 multi-core architectures, either symmetric or big.LITTLE, with multiple cores, network-on-chip, and memory models; and includes sampling, checkpointing and fast-forwarding features to reduce simulation time.
gem5 is an open-source project that Arm has supported for many years with significant contributions, and it is still heavily involved in its further development to make it the simulator of choice for computer architecture simulations.
For a quick start with gem5, Arm has put together a Research Enablement Kit with a set of scripts to easily download, compile and execute benchmark suites such as the PARSEC multithreaded benchmarks.
Given the success of the tutorial at ICS, it will be repeated at the Arm Research Summit 2018 hosted in Robinson College, Cambridge, UK between September 17-19, 2018. The tutorial will cover the Scalable Vector Extension and a deep dive on the features, capabilities and usage of gem5 with an emphasis on SVE support, and will also include a session focused on added quality-of-service (QoS) modeling features, including the experiences of several Arm partners using it.
The tutorial will take place on September 18, 2018 with the following tentative agenda:
Register for the Summit
The International Conference on Supercomputing is the premier international forum for the presentation of research results in high-performance computing systems held since 1987. The focus of ICS is on high-performance computers and computation:
The next edition of ICS will take place in Phoenix, AZ, USA between June 26-28, 2019.