Arm Community
Arm Community
  • Site
  • User
  • Site
  • Search
  • User
Arm Community blogs
Arm Community blogs
Mobile, Graphics, and Gaming blog ARM Mali-T604: New GPU & Architecture For Highest Performance & Flexibility
  • Blogs
  • Mentions
  • Sub-Groups
  • Tags
  • Jump...
  • Cancel
More blogs in Arm Community blogs
  • AI blog

  • Announcements

  • Architectures and Processors blog

  • Automotive blog

  • Embedded and Microcontrollers blog

  • Internet of Things (IoT) blog

  • Laptops and Desktops blog

  • Mobile, Graphics, and Gaming blog

  • Operating Systems blog

  • Servers and Cloud Computing blog

  • SoC Design and Simulation blog

  • Tools, Software and IDEs blog

Tags
Actions
  • RSS
  • More
  • Cancel
Related blog posts
Related forum threads

ARM Mali-T604: New GPU & Architecture For Highest Performance & Flexibility

Jem Davies
Jem Davies
September 11, 2013
8 minute read time.

Today we announced the ARM® Mali™-T604 GPU, the first implementation of ARM’s new Midgard architecture. The increase in screen resolutions and the demand for better-looking and more intuitive displays needs a huge increase in graphics capability. These demands for the highest levels of performance and flexibility, support for new APIs such as Khronos™ OpenCL™ and Microsoft® DirectX®, all in an energy-efficient way called for a new embedded GPU architecture...

Wait. That was a bit dull. It didn’t have the why, the how, or my excitement! Let me try again:

What

At last; it’s here. We’ve been hinting, and I’ve been bursting to tell you all about our new baby for a while now, but today is the day that we finally announce to the world our new graphics processor (GPU) originally code-named Vithar here at the ARM Technology Conference. Finally the ARM Cortex™-A15 has a companion to play with… This is the culmination of so much hard work, and I’m so proud, but first I want to give you some brief history…

The Background
Over 5 years ago, my boss asked me to go buy a graphics company to kick-start our entry into the graphics market which was clearly ready for ARM-quality IP. We looked around and closely investigated all possibilities. The team and I eliminated all the others and settled on Falanx – a start-up company in Trondheim, Norway with great technology and superb engineers. You can see more of this story in a blog to be posted by Edvard Sørgård, one of the founders, who is still with us at ARM, designing GPUs. Even then, they had ideas on the drawing-board for a new graphics architecture, which came to be called the Midgard architecture (“Midgard” is the realm of men in Norse mythology, connected to the realm of the gods by a bridge).

Since then, we’ve invested significantly, increased the size of the team hugely and have design centers working on graphics in Trondheim, the UK, Lund in Sweden, San Jose in the US and Shanghai in China. Firstly we made the Mali-200 and Mali-400MP the world’s first embedded multi-core GPU architecture, then we took the best of the graphics design ideas, added some of ARM’s CPU and cache/bus expertise into the mix and produced the world’s best embedded GPU, the ARM Mali-T604. Well, that’s my opinion, and I will explain why.

Mali-T604_Small.jpg

Performance, performance, performance. (Why do I need a new GPU?)
Today’s screen resolutions span from very small mobile phone displays through to 1080p DTVs and beyond in the home. The quest for better-looking, more informative, more intuitive displays means we are seeing huge increases in typical display resolutions and the graphics performance required to meet those demands is proportional to the number of pixels on the display multiplied by the desired frame rate. Simultaneously with that, we are seeing around a 10 fold increase in the complexity of processing done per-pixel in modern content, including games. All this amounts to greater than a 50 fold increase in graphics capability needed for next generation products and the Mali-T604 was designed to address those needs.

Did I say performance? I meant scalable performance
In addition to a demand for increased performance we are seeing a demand for scalable performance both in terms of different designs and within a particular SoC. Partners are asking for a GPU that can be scaled for different designs: feature phones, smartphones, mobile computing devices, digital home and automotive infotainment systems. Partners are licensing Mali-T604 for use in numerous segments using different numbers of cores in their multicore design.

Energy-efficient performance
Partners also require dynamic control over both performance and energy use. The multicore nature of the Mali-T604 design provides the capability for SoC designers to power-off cores that are not in use, enabling them to tune their energy use to a minimum. This flexibility of fine-grained control is proving to be very popular and will keep ARM at the forefront of energy-efficient visual computing.

It’s system performance, stupid (and system energy-efficiency)
ARM makes great IP components, but that’s not enough. Mali-T604 is even better when used with the CoreLink™ CCI-400 Cache Coherent Interconnect and the Cortex-A15 processor. It was designed to be coherent with the CPU’s caches, and this ability to snoop into its caches reduces external memory bandwidth and reduces the load on the CPU.

In addition to this, Mali-T604 improves on Mali-400 which was already the world’s lowest memory bandwidth embedded GPU and reduced that bandwidth even further through advanced (and patented) technology such as improved caching, hierarchical tiling, and transaction elimination of writes to the framebuffer.

The energy used to transfer data to DRAM is often as significant in the SoC as the energy used in the GPU itself. Reducing external memory bandwidth saves overall SoC power. Also, in modern battery-powered SoCs, the memory bandwidth limits are often the overall limits to real, achieved performance. It’s all aimed at making the real-world, delivered performance of Mali-powered SoCs the best in the world.

blogentry-103749-1289403531.jpg

Greater flexibility and new APIs
Those of you who have been following my recent blogs will not be at all surprised to hear that Mali-T604 is the first graphics processor from ARM supporting GPU computing (GPGPU). The Midgard architecture was designed from the start to have extra flexibility for the new APIs and the Mali-T604 product includes an implementation of OpenCL v1.1 (full profile) that supports both ARMv7/NEON CPUs and the Mali-T604 GPU, as you’d expect from a company that sells CPUs and GPUs. This realises the potential for maximising the control and use of the resources between the CPU and GPU in a system, and is a capability that is central to GPU computing using OpenCL. You can see more details of our unique joined-up OpenCL product at my colleague Rob Elliott’s presentation at TechCon.

I’ve discussed the areas in which I think GPU computing will take off in the embedded world in general in previous blogs. As systems become increasingly more complex, the ability to use all of the resources in a device becomes more important, particularly while minimizing power consumption for a mobile device. Being able to share the workload of some tasks between the CPU & GPU within a system is a feature unique to Mali-T604 amongst embedded GPUs and will I believe, enable additional resources to be used for performance intensive applications such as image processing and augmented reality. I look forward excitedly to see what cool stuff designers will do with this capability.

How do they do that?
The demands for the highest levels of performance and the flexibility to support new APIs such as OpenCL and Microsoft DirectX called for a new architecture. The Midgard architecture is ARM’s new architecture for the next generations of our Mali GPU family. Mali-T604 is the first implementation of the Midgard architecture, which is designed to address the demands of the evolving world of graphics and prepared to meet the challenges of using GPUs to solve other types of computational problems.

blogentry-103749-1289404739.jpg

"Tri-Pipe" Architecture
Our new shader core is the heart of the new GPU and it's really cool. Based on a radical “tri-pipe” architecture using three different types of execution pipeline within the shader core, it simultaneously addresses the demands of evolving high-performance graphics and GPU computing without compromising graphics performance or efficiency. The tri-pipe architecture delivers higher levels of performance through parallelising the issue of instructions to do the three main parts of graphics and GPU computing.

The arithmetic pipeline supports full IEEE-754-2008 and has a wide range of data types from FP16, through FP32 to double-precision FP64 and all the integer types. OpenCL efficiency and performance is assured with a large number of the Built-In Function Library routines supported directly as instructions. The texture pipeline supports all the new texture formats needed for the new APIs and both it and the load/store/varyings pipeline have new features to reduce energy consumption and increase throughput in real-world memory systems.

I could go on about our wonderful new GPU for ever, but I think that's enough for now. Tell me what you think... If you want a bit more, here’s a short video I made…



Follow @ARMMultimedia and #ARMMali on Twitter for Mali-T604 news and updates. Also, tell us what you are looking forward to most with the Mali-T604 technology on the ARM Facebook fan page.

Interview at ARM Techcon 2010

Jem is an ARM Fellow and likes to think of himself as "The Godfather" to technical talent in ARM. After spending some time in his youth writing software for satellites and traffic-lights among other fascinating things, Jem spotted the technical inflection point of the mobile industry: graphics, video and other visual processing. As VP of technology in the Media Processing Division of ARM, Jem is busy with a lot of projects involving the future of cool ARM technology, which will revolutionise how people experience and interact with digital devices.

Anonymous
  • Sean Lumly
    Sean Lumly over 11 years ago
    I notice in the video you mention framebuffer compression. Does this imply that the framebuffer is compressed on-chip, or are you referring to transaction elimination?

    I also notice that there's a significant improvement in texture compression. What texture compression schemes are available to the Midgard architecture?

    Midgard looks fierce! Congrats on an amazing launch, and after seeing what Mali400 is capable of, I can only imagine the T604 with the improved bandwidth and generally more execution pipelines!
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Jem Davies
    Jem Davies over 11 years ago
    Thanks Sean. Interesting article.
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Sean Lumly
    Sean Lumly over 11 years ago
    Thank you very much for the added information, Jem!

    I am a huge advocate for ARM's Mali GPU, and have been very impressed with the ASTC format from Tom's blog post. That it is likely to be a standard is an amazing achievement (and I can think of scarcely a better candidate based on the testimony).

    If you are interested, I have written an article on Mali Optimization that I have shared with my almost 2K Google Plus following that you are welcome to read: [url="http://goo.gl/hxYOv"]http://goo.gl/hxYOv[/url]

    Congrats again, and I'm really looking forward to seeing the T604 in action!
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Jem Davies
    Jem Davies over 11 years ago
    Well spotted Sean and sorry for the delay in replying.

    I did indeed make a mistake in referring to frame buffer compression,  and I did mean to talk about transaction elimination – the way in which  portions of the screen which are not modified between frames aren’t  written out to the frame buffer in memory, thus saving external memory  bandwidth and SoC power. We're seeing great results from this feature in  silicon.

    To answer your other question, in the current Midgard architecture  graphics processors, the texture compression schemes available are: EAC,  ETC2, NXR, BC1 through BC7 (the DXT formats). In the next release, we  will include ASTC - Adaptive Scalable Texture Compression, which is set  to become a new standard. Tom Olson blogged about it [url="http://blogs.arm.com/multimedia/643-astc-texture-compression-arm-pushes-the-envelope-in-graphics-technology/"]here.[/url]
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
  • Jem Davies
    Jem Davies over 11 years ago
    Following extensive feedback from people bemoaning my lack of knowledge of Norse mythology, I've corrected the description of Midgard in Norse mythology.
    • Cancel
    • Up 0 Down
    • Reply
    • More
    • Cancel
>
Mobile, Graphics, and Gaming blog
  • What is Arm Performance Studio?

    Jai Schrem
    Jai Schrem
    Arm Performance Studio gives developers free tools to analyze performance, debug graphics, and optimize apps on Arm platforms.
    • August 27, 2025
  • How Neural Super Sampling works: Architecture, training, and inference

    Liam O'Neil
    Liam O'Neil
    A deep dive into a practical, ML-powered approach to temporal super sampling.
    • August 12, 2025
  • Start experimenting with Neural Super Sampling for mobile graphics today

    Sergio Alapont Granero
    Sergio Alapont Granero
    Laying the foundation for neural upscaling to enable sharper, smoother, AI-powered gaming on next-generation Arm GPUs.
    • August 12, 2025