You might have noticed that around this time of year we start to talk about our latest high end GPU. Well, 2017 is no different and we in the Arm Mali team are delighted to welcome Mali-G72 to the High Performance GPU roadmap.
Following on from Mali-G71 released last year, Mali-G72 was launched at Computex 2017 and builds on the introduction of the awesome Bifrost architecture to provide even greater performance within an ever smaller area and power budget. Designed for High Fidelity Mobile Gaming and the emerging field of Machine Learning (ML) on device, Mali-G72 also takes the VR capability of Mali-G71 to a whole new level. Mali-G72 based devices will have 1.4x the overall graphics performance compared to devices based on its predecessor, guaranteeing it’s ready to meet the needs of whatever fantastic new tech hits the industry next.
As I mentioned before, one of the major driving forces behind Mali-G72 is the rise of High Fidelity Gaming on mobile. Whilst there is still a huge market for casual games like Candy Crush, we’re seeing more and more growth in the revenue generated by complex games, with 43% of the Chinese mobile gaming industry now made up of these titles. Photorealistic visuals, like the ones in Digital Legends’ First person shooter, Afterpulse, used to be impossible on mobile. The power consumption of high vertex counts, numerous draw calls and more complex vertex and fragment shaders, as well as advanced graphics effects like dynamic shadows, was simply too high for the mobile form factor and reduced both quality and playtime. We consult and collaborate with our incredible ecosystem of partners and developers to ensure our newest products meet the needs of the market no matter their individual priorities. We worked closely with Digital Legends to ensure that the latest advanced rendering techniques could be supported alongside our fantastic optimization tools to maximise both performance and efficiency and were able to attain a 42% write bandwidth saving over Mali-G71. Add in the use of Pixel Local Storage (PLS) and you can save an additional 45%, making a total read bandwidth saving of 68%. It’s this collaboration which breeds innovations like those in the Mali-G72 and makes feature-rich games like Afterpulse a reality for mobile gamers.
*Newzoo research based on top 200 revenue generating games
VR is evolving too, and we knew we needed to up our game even further to continue to lead this exciting market. More than 50% of existing mobile VR devices are powered by Mali, and the Mali-powered Mate 9 is one of the first Daydream certified VR devices available, so continuous innovation is a top priority. As you might have seen in our recent Circuit VR demo, released at GDC 2017, we’ve been working on techniques such as mobile Multiview to reduce the overhead of drawing things multiple times as you typically need to in VR (where you effectively need one complete render per eye). Add in foveated rendering, where you only see the section of the image directly in line with your fovea in high resolution, and you suddenly have four or more views to render and Multiview really comes to its own. Other techniques like Multi Sample Anti-Aliasing (MSAA) add blended pixels to either side of a line which should appear smooth in order to reduce the jagged effect which can sometimes be seen in the close quarters of the VR headset. Mali-G72 enables 8 or 16 x MSAA at minimal system cost. All of this of course comes on top of already existing, clever innovations such as Adaptive Scalable Texture Compression (ASTC), allowing us to incorporate higher quality textures without compromising on the amount of bandwidth used.
I also mentioned earlier that Machine Learning is another key use case on mobile, but let me clarify what I mean by this. Today, ML is often performed in the cloud, with large data sets used to train neural networks to begin to make intelligent connections, but more and more needs to happen on device. Not only is it costly to keep transferring large amounts of data to the cloud for simple applications like translation, but it’s also slow. I don’t know about you, but I don’t have much time for latency. I expect my smartphone to do what I want, when I need it, and waiting for a connection or data transfer can put me off using even the best applications. This is why the focus is very much on directing ML inference to the device itself. Huawei have already seen the need for this in their latest premium device, the Mate 9, powered by the Mali-G71 and released a record breaking 8 months after they first received the product. In the Mate 9, the ML algorithm establishes which applications you use the most and intelligently prioritizes the power and performance to make sure they perform at their best. Mali-G71 with its innovative Bifrost architecture is already pretty good at ML inference, as you can see in the chart below – the Mali-G71 MP8 in the Huawei Mate 9 handles AlexNet 87% quicker than a low-end discrete graphics card, which has comparable graphics performance.
Well, Mali-G72 is even better. The arithmetic optimizations and increased caches we look at later really come into their own here, reducing bandwidth to such an extent that Mali-G72 can provide the most efficient and performant ML possible. So how do we support these use cases?
Retaining Bifrost’s key high performance features such as full system coherency between the CPU and GPU, as well as index-driven position shading, clause-based execution and quads; Mali-G72 packs a few new punches too. Optimizations in the arithmetic efficiency as well as enhanced capabilities for both complex graphics performance and scalability, make Mali-G72 the obvious choice for next year’s premium mobile products across smartphone, VR, ML and many other opportunities. But what exactly have we done?
We’ve increased tile buffer memory in order to allow the GPU to support more storage per active tile. This increases throughput in light loading situations as well as allowing greater utilization of Multi Sample Anti-Aliasing (MSAA) and Pixel Local Storage (PLS) and providing significant improvements to performance and visual quality. We’ve also rebalanced the execution engine data path to remove some rarely-used instructions and replace them with sequences of simpler instructions to reduce both area and power, lowering cost of implementation for our partners and increasing efficiency throughout the system. To support higher graphics complexity we’ve optimized the more complex operations, such as reciprocal square root, that are used most frequently, and increased caches in the tiler for better throughput. These changes improve performance scaling in high performing systems and provide a better graphics experience to the end user. In order to further reduce bandwidth we’ve increased the size of both the Level 1 cache and the writeback cache, as well as changing the instruction cache logic to allow better utilization and reduce cache misses in complex content without increasing the overall area or power. This careful balance of performance and efficiency is vital to those partners targeting a range of devices.
Mali-G72, with its many innovations on the Bifrost architecture, has achieved some serious gains over the previous generation product, including 25% higher energy efficiency, 20% more performance per mm2 of silicon and 17% more efficiency for Machine Learning. With all this AND 40% more in-device performance overall, it’s only a matter of time until we see the Mali-G72 exceeding our expectations in next year’s premium mobile devices.
Accelerating AI experiences from edge to cloud
Arm Cortex-A55: Efficient performance from edge to cloud
Arm Cortex-A75: ground-breaking performance for intelligent solutions
Congratulations on the launch, Freddi, and also to our mutual customers who've already taped out this new ARM IP using Synopsys' Design and Verification Continuum Platforms.https://community.arm.com/processors/f/discussions/8587/cortex-a75-cortex-a55-mali-g72-customers-have-already-taped-out-using-synopsys-design-and-verification-tools