The Mali-G71 GPU is the latest and greatest offering in the Mali high-performance family of GPUs. Built on the brand new Bifrost architecture, Mali-G71 represents a whole new level of high-end mobile graphics capabilities whilst still maintaining Mali’s position as a leading GPU in a highly competitive market.
Mali-G71 was developed taking into account the advanced, and ever advancing, use cases for high end mobile like Virtual Reality (VR), Augmented Reality (AR) and 3D gaming; and modern APIs such as Vulkan and OpenCL 2.0. It’s been a few years since the pinnacle of mobile gaming was Snake but the industry has advanced so fast and so far since then that even today’s high-end devices could struggle with the next generation of gaming requirements. Mali-G71 aims to address this potential shortfall by looking ahead to the next level of mobile graphics and ensuring the devices it powers will be more powerful, efficient (and generally more awesome) than ever before. So much so, that devices powered by the Mali-G71 GPU are even capable of competing with mid-range laptops in terms of graphics capability.
The new Mali Bifrost architecture represents a step change in the industry and enables the future of mobile graphics. There are numerous innovations and optimizations built in to the new design but we’ll highlight just a few.
Claused shaders allow you to group sets of instructions together into defined blocks that will run to completion atomically and uninterrupted. This means we can be sure all external dependencies are in place prior to clause execution and we can design execution units to allow temporary results to bypass accesses to the register bank. This reduces the pressure on the register file, drastically decreasing the amount of power it consumes and also contributes to area reduction by simplifying the control logic in the execution units.
Claused shaders provide significant power savings
Another innovation in the Bifrost architecture is Quad based vectorization. Midgard GPUs used SIMD vectorization which executed one thread at a time in the pipeline stage and was very dependent on the shader code executing vector instructions. Quad vectorization allows four threads to be executed together, sharing control logic. This makes it much easier to fill the execution units, achieving close to 100% utilization and better fits recent advances in how developers are writing shader code.
The previous generation of High performance mobile GPUs were scalable from 1 to 16 cores. To reflect the ever growing performance requirements of mobile devices, Mali-G71 is scalable from 1 to 32 cores. The scalability of Mali-G71 means superior graphics performance is available across a wider than ever range of devices from DTVs through high end smartphones right up to cutting edge VR headsets, either mobile-based or standalone. This flexibility, along with the 40% improvement in area efficiency, allows our partners to configure their system to their exact requirements, striking the perfect balance between power, efficiency and cost in order to perfectly position their products in their target market.
Mobile gaming is fast becoming the platform of choice for gamers everywhere. In 2017 the market for mobile gaming is expected to hit over US$40 billion, up $10 billion from 2016.* This rapid growth needs to be sustainable on up and coming mobile devices and with greater complexity appearing year on year, this is no mean feat. Our gaming demos from just a couple of years ago had half the number of vertices as the ones we’re producing today and this all adds up in terms of power and efficiency requirements. If applications continue to advance at this rate the ability to scale to 32 cores could rapidly become a basic necessity for premium mobile devices. On top of this, Mali-G71 delivers 20% higher energy efficiency compared to Mali-T880 under similar conditions – translating to higher sustained device performance in thermally limited premium devices.
API advancements are something we take very seriously, after all, they define how developers interact with the underlying hardware. As a GPU and CPU company we need to meet developer needs so that end users get the best possible device experience. In recent years there’s been a move towards giving developers lower level access to the hardware, in Khronos, this trend lead to the emergence of the new Vulkan 1.0 API. In a similar vein, OpenCL 2.0 was developed to make heterogeneous compute more developer friendly and there are high hopes that we will see some radical new use cases popping up once OpenCL2.0 enabled devices are shipping in the market. Mali-G71 is not only designed to support Vulkan 1.0 and OpenCL 2.0 Full Profile – it even has support for Fine Grained buffers and shared virtual memory, enabled through full hardware coherency support. Again, this is primarily to ease software development effort, leading to better end user experiences.
VR is what everyone’s talking about in the graphics industry at the moment: what it takes, what it needs and how to provide the very best VR experience to the user. The Mali-G71 GPU was built with just this sort of challenge in mind. The extensive performance requirements of VR mean that GPUs for high end devices have to be more energy efficient than ever before. Not only that, but other components of the mobile, like cameras and screen resolutions, are advancing and performing at ever higher rates and therefore all contributing to maxing out the thermal budget of the device. This puts even greater pressure on the GPU to reduce power usage wherever possible.
The Mali family of GPUs also has some great VR optimization features to allow for the best possible mobile VR experience. Front buffer rendering allows you to bypass the usual off screen buffers to render directly to the front buffer, saving time and reducing latency. Mali also supports the ‘multiview’ API extensions that allow the application to submit the draw commands for a frame to the driver once and have the driver instantiate the necessary work for each eye. This greatly reduces the CPU time required in both the application and driver. On Midgard and Bifrost based Mali GPUs we further optimize the vertex processing work, running the parts of the vertex shader that do not depend upon the eye once and sharing the results between each eye. These are just some of the features that make Mali-G71 the obvious choice for the future of mobile VR.
We’re using our phones for more and more, these days many of us don’t even need a home computer or laptop because we can do everything we need on our phone, including downloading and viewing content and streaming it to other devices. The recently released Mali-DP650 display processor already has the capability to handle 4k content and the Mali-G71 allows this content to be streamed seamlessly to your TV without losing any of the quality. This means that, whilst 4k hasn’t yet taken off on mobile, you don’t need to miss out on any of the benefits when viewing the content on a separate 4k device.
Mali-G71 was designed and optimized as part of a complete system, working better together as part of the Mali Multimedia Suite with CCI-550 providing full coherency for CPU and GPU. Mali-G71 is achieving the highest possible performance for mobile graphics within the smallest possible power budget and silicon area, allowing our partners to achieve the pinnacle of mobile graphics in the most scalable and customizable way. With Mali-G71 based devices expected to hit the shelves early in 2017, next level mobile gaming and graphics is right within your grasp.
If you enjoyed this blog, why not read about memory systems and Mali-G71 below?
[CTAToken URL = "https://community.arm.com/processors/b/blog/posts/memory-system-is-key-to-user-experience-with-cortex-a73-and-mali-g71"_blank" text="Memory system is key to user experience with Mali-G71" class ="green"]
Very interesting!
I notice that there isn't the presence of a special-unit for doing complex math. Is the quad being used as a pipeline for operations like sqrt, div, trig, etc? Or does the unit exist, but is absent from diagrams?
And IDPS seems like it makes a lot of sense. It should save tremendous amounts of bandwidth by culling unseen geometry altogether (after initial shading) from further processing, especially given a tile-based workload that requires intermediate write-out to memory. I'm guessing that you'll cover IDPS in much more detail, so I will withhold further questions!
I'm very excited by Bifrost, and very excited to read your article!
Sean